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ABSTRACT 

This  final  technical  report  outlines  research  and  main  results  obtained  during  the  period  from  May  1  2006  through  October  31  201 1  of  the 
MURI  project.  The  objective  was  to  develop  a  general  and  systematic  foundation  and  algorithms  for  spatiotemporal  statistical  inference  and 
for  fusion  of  heterogeneous  information  from  multi-source,  multi-sensor  distributed  sensor  networks.  Immediate  applications  of  the 
proposed  work  are  Network  Centric  Warfare,  where  new  and  emerging  systems  such  as  MASINT  and  FORCENet  collect  but  do  not 
adequately  interpret  vast  amounts  of  data;  information  assurance  and  network  security;  and  homeland  security  applications,  including  video 
monitoring,  and  near-field  and  far-field  intelligence  analysis.  Our  research  was  targeted  to  solving  three  central  problems:  (a) 
nonstationarity,  (b)  integrating  metric  and  symbolic  infonnation,  and  (c)  very  high  dimensionality.  Current  methods  for  pattern  recognition 
in  monitoring  and  surveillance  are  designed  for  stationary  patterns,  and  cannot  cope  with  new  patterns  in  ever-changing  enviromnents.  We 
developed  new  statistical  methods  for  the  nonstationary  environment,  particularly  spatiotemporal  nonlinear  filtering,  changepoint  detection, 
and  advanced  fusion  methods.  A  distinctive  feature  of  our  approach  is  that  the  spaces  in  which  estimation,  classification  and  tracking  is 
performed  are  both  metric  and  symbolic. 
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6.  A.P.  Brown  and  A.G.  Tartakovsky,  “Spatiotemporal  Clutter  Rejection  and  Track-Before-Detect  Methods  for  Tracking  Small  Dim 
Objects,”  32nd  Review  of  Atmospheric  Transmission  Models  Meeting,  Lexington,  Massachusetts,  14-15  June  2010  (Invited). 

7.  A.G.  Tartakovsky,  “Spatial-Temporal  Image  Processing  Techniques  and  Applications  to  Remote  Sensing,”  Department  of  Mathematics, 
Stanford  University,  2009  (Invited). 

8.  A.G.  Tartakovsky,  “Quickest  Changepoint  Detection:  Recent  Advances  and  Open  Problems,”  NSF  Sponsored  Workshop  in  Honor  of 
Professor  A.V.  Balakrishnan,  January  30,  2009  (Invited). 

9.  A.G.  Tartakovsky,  “Efficient  Numerical  Methods  for  Optimization  and  Perfonnance  Evaluation  of  Changepoint  Detection  Procedures,” 
Department  of  Probability,  Moscow  State  University,  Moscow,  Russia,  March  25,  2009  (Invited). 

10.  A.G.  Tartakovsky,  “Adaptive  Spatial-Temporal  Image  Processing  Techniques  and  Applications  to  Clutter  Rejection  in  Remote 
Sensing, ’’Workshop  “Spatiotemporal  Image  Processing  and  Visual  Surveillance”,  University  of  Southern  California,  2008  (Invited). 

1 1 .  A.G.  Tartakovsky,  “Exact  Optimality  of  the  Shiryaev-Roberts  Procedure  for  Detecting  Changes  in  Distributions,”  Department  of 
Mathematical  Sciences,  University  of  Technology,  Sydney,  Australia,  November  27,  2008  (Invited). 

12.  A.G.  Tartakovsky,  “Detection  and  Classification  in  Distributed  Multisensor  Systems  with  Applications  to  Network  Security,” 

Workshop  “Sensor  Networks  and  Future  Internet  Security”,  University  of  Southern  California,  May  23,  2007  (Invited). 

13.  A.G.  Tartakovsky,  “Asymptotic  Optimality  in  Sequential  Quickest  Change-Point  Detection:  Theory  and  Applications,”  Princeton, 
September  25,  2007  (Invited). 

14.  A.G.  Tartakovsky,  “Quickest  Change-Point  Detection:  Previous  Achievements  and  Open  Problems,”  First  International  Workshop  on 
Sequential  Methodologies,  Auburn,  AL,  22-25  July  2007  (Invited). 

15.  A.G.  Tartakovsky,  “Asymptotic  Optimality  in  Sequential  Hypothesis  Testing  and  Quickest  Change-Point  Detection  for  General 
Continuous-Time  Stochastic  Processes,”  Workshop  on  Inverse  Problems  in  Stochastic  Differential  Equations,  University  of  Southern 
California,  Los  Angeles,  CA,  22-26  May,  2007  (Invited). 

16.  A.G.  Tartakovsky,  “An  Asymptotically  Optimal  Change  Detection  Strategy  Under  Nontraditional  Global  False  Alarm  Probability 
Constraint,”  The  2007  Taipei  International  Statistical  Symposium  and 

ICSA  International  Conference  (Session:  Change-Point  Analysis  and  Applications),  Taipei,  Taiwan,  24-28  June,  2007  (Invited). 

17.  V.V.  Veeravalli  and  A.G.  Tartakovsky,  “Quickest  Change  Detection  in  Sensor  Networks,”  First  International  Workshop  on  Sequential 
Methodologies,  Auburn,  AL,  July  2007  (Invited). 

18.  B.L.  Rozovsky,  “Generalized  Malliavin  calculus  and  Stochastic  PDEs,”  Columbia  University,  Minerva  Foundation  Lectures,  December 

2010. 

19.  B.L.  Rozovsky,  “Stochastic  Fluid  Dynamics,”  NSF  Institute  for  Pure  and  Applied  Mathematics,  Invited  lecture,  January  2011. 

20.  B.L.  Rozovsky,  “Stochastic  Fluids  and  Malliavin  Calculus,”  Conference  on  Malliavin  Calculus  and  Stochastic  Analysis,  University  of 
Kansas,  Invited  talk,  March  2011. 

21 .  B.L.  Rozovsky,  “On  Unbiased  Stochastic  Navier-Stokes  Equation,”  Workshop  on  SPDEs,  Archimedes  Center  for  Modeling,  Analysis, 
and  Computations,  Heraklion,  Greece,  Invited  lecture,  June  2011. 

22.  B.L.  Rozovsky,  “Recent  Advances  in  Nonlinear  Filtering,”  Imperial  College,  London.  Invited  lecture,  June  2011. 

23.  B.L.  Rozovsky,  “Stochastic  Fluid  Dynamics  and  Malliavin  Calculus,”  Oxford  University,  Invited  lecture,  2011. 

24.  B.L.  Rozovsky,  “Uncertainty  Quantification  and  Nonlinear  Filtering,”  ICIAM  201 1,  Vancouver,  Canada,  2011. 

25.  B.L.  Rozovsky,  “On  Unbiased  Stochastic  Navier-Stokes  Equation,”  ICIAM  2011,  Vancouver,  Canada,  2011. 

26.  B.L.  Rozovskii,  Invited  Talk,  SIAM  conference  on  Computational  Science  and  Engineering,  Maiami,  2009. 

27.  B.L.  Rozovskii,  Invited  Talk,  7th  ISAAC  Congress,  London,  2009. 

28.  B.L.  Rozovskii,  Invited  Talk,  International  Conference  on  Spectral  and  High  Order  Methods,  2009,  Trondheim,  Noiway. 

29.  Alethea  Barbaro,  Agent-based  Complex  Systems  Workshop  at  IP  AM:  Organized  and  spoke  October  14,  2009  1  hour,  Title: 
’’Agent-based  modeling  for  animal  migration  and  gang  behavior”. 

30.  Alethea  Barbaro,  American  Soc.  Criminology  Meeting  Philadelphia,  joint  presentation  with  Shannon  Reid,  Nov.  6,  2009,  ’’Agent-based 
simulations:  modeling  gang  violence  in  Hollenbeck”. 

31.  Alethea  Barbaro,  UCSB  Hypatian  Seminar,  Nov  30,  2010,  ’’Agent-based  modeling  of  complex  systems,  and  how  to  claim  your 
mathematical  territory  after  your  doctorate”. 


32.  Alethea  Barbara,  IP  AM’s  Optimal  Transport  Reunion  Workshop  at  Lake  Arrowhead  (invited  talk)  December  10,  2009,  20  minutes, 
Title:  ”On  limits  of  a  discrete  time  interacting  particle  system” 

33.  Alethea  Barbara,  2nd  Annual  Southern  California  Women  in  Math  Symposium  February  20,  2010, 

30  minutes,  Title:  ’’Agent-based  models  of  social  dynamics”. 

34.  Alethea  Barbara,  2010  Mathematics  Festival  at  UCLA  (2  sessions)  February  13,  2010,  Two  sessions,  each  50  minutes  Title:  ’’Modeling 
the  Real  World:  Using  Math  to  Study  Migration,  Territoriality,  and 

Social  Networks”. 

35.  Alethea  Barbara,  Invited  Talk  at  USC  for  theWomen  in  Math  Seminar,  March  12,  2010,  1  hour,  Title:  ’’Simulating  social  dynamics 
with  interacting  particle  models”. 

36.  Alethea  Barbara,  Invited  Seminar  Talk  at  Redlands,  March  31,  2010,  1  hour,  Title:  ’’Simulating  Social  Dynamics  with  Interacting 
Particle  Models”. 

37.  Alethea  Barbara,  Talk  at  SIAM’S  DSPDEs  conference  in  Barcelona,  Spain  (Organized  Mini-symposium  and  spoke),  Mini-symposium 
title:  Particle  and  mean  field  models  for  flocking  and  swanning,  co-chair 

Massimo  Fomasier,  June  1,  2010,  30  minutes,  Talk  Title:  ’’Interacting  particle  models  for  Social  Dynamics”. 

38.  Alethea  Barbara,  Workshop:  Modeling  Complex  Dynamics  in  Biological  Systems,  Universite  Paul  Sabatier,  Toulouse,  France  (invited 
talk)  June  9,  2010  1  hour  Title:  ’’Interacting  particle  models  for  animal  social  dynamics”. 

39.  Alethea  Barbara,  Workshop:  Mathematics  of  Complex  Systems,  Universite  Paul  Sabatier,  Toulouse,  France  (invited  talk)  June  10, 
2010,  45  minutes,  Title:  ’’Agent-based  models  for  gang  dynamics”. 

40.  Alethea  Barbara,  Kinetic  and  Mean-field  models  in  the  Socio-Economic  Sciences:  workshop  at  ICMS,  Edinburgh,  Scotland  (25  minute 
invited  talk),  July  31,  2009,  Title:  ’’Fish  migration,  interacting 

particles,  and  scaling  laws”. 

41.  Andrea  Bertozzi,  Women  in  Mathematics  Seminar,  Univ.  of  Wisconsin,  Madison,  WI,  October  7,  2009. 

42.  Andrea  Bertozzi,  Invited  talk  and  co-organizer,  IP  AM  workshop  on  ’’Agent  Based  Complex  Systems”,  October  14,  2009. 

43.  Andrea  Bertozzi,  Invited  address,  Southern  California-Nevada  MAA  Section  Meeting,  October  17,  2009. 

44.  Andrea  Bertozzi,  Invited  talk,  Workshop  on  Self-Organization  and  Multi-Scale  Mathematical  Modeling  of  Active  Biological  Systems, 
Statistical  and  Applied  Mathematical  Sciences  Institute,  Durham, 

NC,  October  27  2009. 

45.  Andrea  Bertozzi,  Invited  talk,  Army  Research  Office,  Durham,  NC  October  28,  2009. 

46.  Andrea  Bertozzi,  Invited  talk  UBC  Vancouver,  PIMS  mini-symposium  in  PDE,  one  hour  talk,  November  13,  2009. 

47.  Andrea  Bertozzi,  Invited  talk  on  “A  Variational  Approach  to  Hyperspectral  Image  Fusion”,  Minisymposium  on  Variational  Methods  in 
Image  Processing  and  Interface  Problems,  Maria  Westdickenherg  and  Sung  Ha  Kang,  Organizers,  SIAM  Conference  on  Analysis  of  Partial 
Differential  Equations,  Miami,  December  7,  2009. 

48.  Andrea  Bertozzi,  Invited  talk  on  ’’Mathematical  Models  for  Urban  Crime”  Minisymposium  on  “Nonlinear  Stochastic  PDEs  and 
Applicationss  to  Complex  Systems”,  Hakima  Bessaih  and  Bjorn  Birnir,  Organizers,  SIAM  Conference  on  Analaysis  of  Partial  Differential 
Equations,  Miami,  December  8,  2009. 

49.  Andrea  Bertozzi,  Invited  talk  in  SIAM  Minisymposium  on  New  Trends  in  Mathematical  Methods  in  Imaging  Science,  Rick  Chartrand, 
Stacey  Levine,  Jennifer  Mueller,  and  Luminita  Vese  organizers,  Joint  Math  Meetings,  San  Francisco,  Sat  Jan  16,  2010. 

50.  Andrea  Bertozzi,  Invited  talk,  China  Lake  Distinguished  Speakers  Colloquium  Series,  China  Lake  Naval  Air  Warfare  Center, 
Ridgecrest,  CA,  Feb  2,  2010. 

51.  Andrea  Bertozzi,  Invited  talk  Rand  Corp.  Santa  Monica,  Feb  1 1,  2010. 

52.  Andrea  Bertozzi,  Invited  talk,  Session  on  ’’Traffic,  Crowds  and  Society”,  AAAS  Annual  Meeting,  San  Diego,  February  20,  2010. 

53.  Andrea  Bertozzi,  Invited  talk,  Imperial  College  London,  Institute  for  Mathematical  Sciences,  invited  talk  in  three  part  session  on 
Geometric  Mechanics,  Darryl  Holm  host,  March  8,  2010. 

54.  Andrea  Bertozzi,  Fluid  Mechanics  Seminar,  DAMTP,  Univ.  of  Cambridge,  UK,  March  5,  2010. 

55.  Andrea  Bertozzi,  Brown  University,  Mathematics  Department,  Distinguished  Lecture  Series,  three  one  hour  lectures,  March  11-12, 

2010. 

56.  Andrea  Bertozzi,  Brown  University,  Mathematics  Department,  faculty  speaker,  Symposium  for  Undergraduates  in  the  Mathematical 
Sciences,  45  minute  talk,  March  13,  2010. 

57.  Andrea  Bertozzi,  Invited  talk,  Minisymposium  on  Advanced  Frameworks  for  Restructuring  High  Dimensional  Datasets,  SIAM  Conf. 
on  Imaging  Science,  Chicago,  IL  April  13,  2010,  Edward  H.  Bosch 

Organizer. 

58.  Andrea  Bertozzi,  Invited  talk,  Plenary  talk,  Joint  SIAM/RSME-SCM-SEMA  Meeting  on  Emerging  Topics  in  Dynamical  Systems  and 
Partial  Differential  Equations  DSPDEs’10  June  1,  2010,  Barcelona,  Spain. 

59.  Andrea  Bertozzi,  Invited  talk  2010  DTRA/NSF  Algorithm  workshop,  talk  on  “Undergraduate  Research  Training  in  Defense 
Applications”,  June  22,  2010,  Chapel  Hill,  NC. 

60.  Andrea  Bertozzi,  Invited  talk  2010  DTRA/NSF  Algorithm  workshop,  talk  on  “Imaging  of  multispectral  and  hyperspectral  data”,  June 


23,2010,  Chapel  Hill,  NC. 

61.  Andrea  Bertozzi,  invited  talk  talk  in  workshop  Fluid  Dynamics  Analysis  and  Numerics,  a  conference  in  honor  of  Tom  Beale’s  60th 
Birthday,  Duke  Univ.,  Durham,  NC  June  28,  2010. 

62.  Andrea  Bertozzi,  invited  talk  at  Park  City  Mathematics  Institute,  Program  on  Imaging  Sciences,  Park  City  UT,  July  5,  2010. 

63.  Andrea  Bertozzi,  Graduate  School  of  Engineering  and  Applied  Sciences,  Distinguished  Lecture,  Naval  Postgraduate  School,  Sept.  2, 

2010. 

64.  Andrea  Bertozzi,  London  Taught  Course  Centre  8  hour  intensive  course  on  Mathematics  of  Crime,  Univ.  College  London,  Sept.  9-10, 

2010. 

65.  Andrea  Bertozzi,  Department  of  Applied  Mathematics  and  Statistics  Johns  Hopkins  University,  Colloquium  Sept.  16,  2010. 

66.  Andrea  Bertozzi,  Allman  Family  Public  Lecture,  Southern  Methodist  University,  Mathematics  in  the  Real  World,  Sept.  23,  2010. 

67.  Andrea  Bertozzi,  Invited  talk,  IPAMworkshop  on  Machine  Reasoning:  Mission  Focused  Actions/Reactions  Based  on  System 
Integration  of  Information  Derived  from  Complex  Real-World  Data,  Oct  19,  2010. 

68.  Andrea  Bertozzi,  Distinguished  Lecture,  Department  of  Mathematics,  Simon  Fraser  Univ.,  Oct.  29,  2010. 

69.  Andrea  Bertozzi,  Invited  talk,  9th  Annual  Image  Fusion  Workshop,  Institute  for  Defense  and  Government  Advancement,  Tyson’s 
Comer,  VA,  November  16,  2010. 

70.  Andrea  Bertozzi,  Invited  talk,  RCIM  Symposium  Mathematical  Aspects  of  Image  Processing  and  Computer  Vision  2010  Sapporo, 
Japan,  November  26,  2010. 

71.  Andrea  Bertozzi,  Invited  talk,  NSF  workshop  on  New  Directions  in  Dynamical  Systems  Inspired  by  Biological,  Energy, 

Environmental,  and  Information  Sciences,  Atlanta,  GA,  Jan  4,  201 1 

72.  Andrea  Bertozzi,  Invited  talk,  Dynamics  Days,  Chapel  Hill,  NC,  Jan  5,  20 1 1 . 

73.  Andrea  Bertozzi,  AMS  Invited  Address,  Joint  Mathematics  Meetings,  New  Orleans,  LA,  Jan  7,  201 1. 

74.  Andrea  Bertozzi,  Invited  Talk  (one  hour),  201 1  annual  meeting  of  the  Australian  and  New  Zealand  Industrial  and  Applied  Mathematics 
division  of  the  Australian  Mathemtaical  Society.  ANZIAM  2011 

in  Glenelg,  Australia,  Feb.  1,  201 1. 

75.  Andrea  Bertozzi,  PIMS  Applied  Mathematics  Seminar,  University  of  Saskatchewan,  Saskatoon,  March  14,  201 1. 

76.  Andrea  Bertozzi,  Seminar,  Ecole  Normal  Superieur  de  Cachan,  Centre  de  Mathematiques  et  de  leurs  Applications,  March  17,  2011. 

77.  Andrea  Bertozzi,  Groupe  de  travail  -  Mathematiques  de  la  decision,  Seminar,  Univ.  of  Toulouse,  March  24,  2011. 

78.  Andrea  Bertozzi,  Colloquium  de  L’lnstitut  de  Mathematiques  de  Toulouse,  March  25,  2011. 

79.  Andrea  Bertozzi,  Mathematics  Colloquium  Univ.  of  Warwick,  UK,  June  3,  201 1,  ’’Mathematics  of  Crime”. 

80.  Andrea  Bertozzi,  Nonlinear  Diffusion:  Applications,  Analysis  and  Computation  conference  to  celebrate  the  60th  Birthday  of  Charlie 
Elliot,  Univ.  Warwick,  June  6-8,  2011,  invited  45  minute  talk. 

8 1 .  Andrea  Bertozzi,  7th  East  Asian  SIAM  meeting,  Waseda  University  Kitakyushu  Campus,  Japan  Keynote  Talk,  June  29,  20 1 1 . 

82.  Andrea  Bertozzi,  Invited  talk,  Minisymposium  on  Modem  Methods  and  Applications  of  the  Calculus  of  Variations:  Image  Processing  , 
July  20,  2011,  International  Congress  on  Industrial  and  Applied 

Mathematics,  Vancouver  BC. 

83.  Andrea  Bertozzi,  Invited  talk,  Duke  Workshop  on  Sensing  and  Analysis  of  High-Dimensional  Data  (SAHD),  July  26,  2011. 

84.  Andrea  Bertozzi,  Plenary  talk,  AWM  40th  Anniversary  Conference,  ICERM,  Brown  University,  September  18,  2011. 

85.  Andrea  Bertozzi,  Applied  Mathematics  Seminar,  Mathematics  of  Crime,  Harvard  University,  September  19,  201 1. 

86.  Andrea  Bertozzi, Widely  Applied  Mathematics  Seminar,  Swarming  by  Nature  and  by  Design,  Harvard  University,  September  20,  2011. 

87.  P.  Jeffrey  Brantingham,  Repeats  and  Reprisals:  The  Dynamics  of  Burglary  and  Rival  Gang  Violence  in  Los  Angeles.  Invited  lecture  at 
the  Workshop  on  Modeling  and  Analysis  of  Security,  January  4-7, 

2010,  University  of  Chile,  2010. 

88.  P.  Jeffrey  Brantingham,  “Agent-based  and  continuum  models  of  crime  pattern  formation,”  Invited  lecture  presented  at  the  Agent-based 
Complex  Systems  workshop,  Institute  of  Pure  and  Applied  Mathematics,  UCLA,  October  12-14,  2009. 

89.  P.  Jeffrey  Brantingham,  “Why  seeking  to  reduce  gang  rivalries  might  increase  gang  violence,”  UC  Irvine  Criminology,  Law  and 
Society,  April  13,  201 1. 

90.  P.  Jeffrey  Brantingham,  “The  Mathematical  Ecology  of  Criminal  Street  Gangs,”  UCLA  Marschak  Colloquium,  April  8,  201 1 . 

91.  P.  Jeffrey  Brantingham,  Stochastic  Models  of  Crime  with  Practical  Implications  for  Policing.  Workshop  on  Geospatial  Abduction 
Problems,  University  of  Maryland,  March  3-4,  2011. 

92.  P.  Jeffrey  Brantingham,  “University-Agency  Collaboration  in  Predictive  Policing,”  10th  Anniversary  Celebration  of  the  Institute  for 
Canadian  Urban  Research,  Simon  Fraser  University,  February  3,  201 1 . 

93.  Maria  D'Orsogna,  invited  talk,  Kinetic  and  mean-field  models  in  the  socio-economic  sciences,  Edinburgh,  Scotland,  July  2009 

94.  Erik  Lewis,  poster  presentation  at  the  Agent-Based  Complex  Systems  workshop  at  IP  AM,  October  12-14,  2009.  “Comparing  Gang 
Rivalries  and  Civilian  Deaths  in  Iraq  Using  Self-Exciting  Point  Processes.” 

95.  George  Mohler,  “Agent-based,  Bayesian  Geographic  Profiling”  Workshop  on  Analysis  and  Modeling  of  Security,”  Jan  4-7,  2010, 
Santiago  Chile,  40  minute  talk. 


96.  George  Mohler,  “Crime  as  a  Self-Exciting  Point  Process:  An  innovative  approach  in  crime  prediction,”  American  Society  of 
Criminology  annual  meeting,  Nov  4-7,  2009,  20  minute  talk. 

97.  Todd  Wittman,  Contributed  talk.  25  minutes.  “Image  Processing  in  the  UCLA  REU  Program.”  International  Conference  on  Technology 
in  Collegiate  Mathematics.  Chicago,  IL.  March  2010. 

98.  Todd  Wittman,  Contributed  talk.  20  minutes.  ’’Problems  in  Geospatial  Image  Processing.”  Center  for  Nonlinear  Analysis  Summer 
School  on  Image  Processing  and  PDEs.  Pittsburgh,  PA.  June  2010. 

99.  Todd  Wittman,  “The  UCLA  Math  REU  Program:  Getting  Students  Involved  in  Research.”  University  of  Southern  California, 
Department  of  Mathematics  Colloquium.  Los  Angeles,  CA.  April  2010. 

100.  Todd  Wittman,  Contributed  talk.  25  minutes.  “Variational  Methods  in  Hyperspectral  Image  Processing.”  SIAM  Conference  on 
Analysis  of  Partial  Differential  Equations.  Miami,  FL.  December  2009. 

101.  Y.  S.  Cho,  G.  Ver  Steeg,  and  A.  Galstyan  “Co-Evolving  Mixed-Membership  Blockmodels”,  NIPS  Workshop  on  Networks  Across 
Disciplines,  2010. 

102.  A.  Allahverdyan,  A.  Galstyan,  and  G.V.  Steeg,  “Clustering  with  Prior  Information,”  NIPS  Workshop:  Clustering:  Science  or  Art? 
Towards  Principled  Approaches,  2009. 

103.  A.  Galstyan,  “Modeling  Covert  Activities  with  Hidden  Markov  Processes,”  SIAM  CADS  Mini-symposium  on  Terrorism  Modeled  as 
a  Dynamical  System,  Snowbird,  Utah,  2009  (invited). 

104.  A.  Galstyan  and  P.R.  Cohen,  “  Comparing  Diffusion  Models  for  Graph-Based  Semi-Supervised  Learning,”  6th  International 
Workshop  on  Mining  and  Learning  with  Graphs  (MLG-08),  Helsinki,  Finland, 

2008. 

105.  A.  Galstyan  and  P.R.  Cohen,  “Influence  Propagation  in  Modular  Networks,”  AAAI  Symposium  on  Social  Information  Processing 
(SIP -08),  Stanford,  CA,  2008. 

106.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Probabilistic  Tracking  of  Plans  and  Intentions  in  Intelligence  Analysis”,  talk  presented  at  the 
WNAR/IMS  Annual  Meeting,  UC  Irvine,  June  2007. 

107.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Detecting  and  Tracking  Hostile  Plans  in  the  Hats  World,”  AAAI  Workshop  on  Plan,  Activity 
and  Intent  Recognition  (PAIR-07),  Vancouver,  Canada,  2007. 

108.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Probabilistic  Plan  Tracking  and  Detection  for  Intelligence  Analysis,”  poster  presented  as  the 
Joint  Statistical  Meetings  (JSM),  Salt  Lake  City,  July  2007. 

109.  G.  Medioni,  Keynote  lecture,  Workshop  on  Perceptual  Organization,  San  Francisco,  CA,  June  13,  2010. 

1 10.  G.  Medioni,  Keynote  lecture,  International  Workshop  on  Computer  Vision,  Shenzhen  Institute  of  Advanced  Technology,  Chinese 
Academy  of  Sciences,  Shenzhen,  China,  July  14,  2010. 

1 1 1.  G.  Medioni,  Invited  lecture,  “Recent  progress  in  object  tracking  (Multi  target  tracking,  tag  and  track,  active  tracking,  tracking  in 
flow)”,  INRIA,  Rocquencourt,  France,  October  2009. 

112.  G.  Medioni,  Keynote  speaker,  Los  Angeles/ Anaheim,  2009  World  Congress  on  Computer  Science  and  Information  Engineering, 
March  31,  2009. 

1 13.  G.  Medioni,  Keynote  speaker,  San  Diego  (Coronado),  Automated  Imaging,  February  4,  2009. 

1 14.  G.  Medioni,  Keynote  speaker,  ISVC,  Las  Vegas,  November  22,  2008. 

1 15.  G.  Medioni,  “Tensor  Voting  in  2  to  N  dimensions:  Fundamental  Elements,”  Distinguished  Lecture,  Brown  University,  September  15, 
2008. 

116.  V.V.  Veeravalli,  “Sensor  Control  for  Information  Collection  and  Fusion.”  International  Workshop  on  Information  Fusion,  Xi’an, 
China,  August  2011  (Plenary  Lecture). 

117.  V.V.  Veeravalli  and  T.  Banerjee,  “Quickest  Change  Detection  with  On-Off  Observation  Control.”  International  Workshop  in 
Sequential  Methodologies,  Palo  Alto,  CA,  June  2011  (Invited). 

118.  V.V.  Veeravalli  and  J.  Fuemmeler,  “Energy-Efficient  Multi-Target  Tracking  Using  Sensor  Networks.”  Army  Conference  on  Applied 
Statistics  (ACAS),  Lexington,  VA,  October  2008  (Invited). 

119.  V.V.  Veeravalli,  “System- Theoretic  Foundations  for  Sensor  Networks.”  IEEE  Communication  Theory  Workshop,  Sedona,  AZ,  May 
2007  (Keynote  Lecture). 

120.  V.V.  Veeravalli,  “System- Theoretic  Foundations  for  Sensor  Networks.”  IWWAN,  New  York,  NY,  June  2006  (Keynote  Lecture). 

121.  V.V.  Veeravalli,  “Smart  Sleeping  Policies  for  Wireless  Sensor  Networks.”  NSF  Workshop  on  Future  Directions  in  Networked 
Sensing,  Boston,  MA,  May  2006  (Invited). 

Number  of  Presentations:  121.00 


Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received  Paper 


TOTAL: 


Number  of  Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received 
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2011/11/11  1 
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Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Paper 

;  98  Steve  DiBenedetto,  Kaustubh  Gadkari,  Nicholas  Diel,  Andrea  Steiner,  Dan  Massey,  Christos  Papadopoulos. 
Fingerprinting  custom  botnet  protocol  stacks,  2010  6th  IEEE  Workshop  on  Secure  Network  Protocols  (NPSec). 
2010/10/04  03:00:00,  Kyoto,  Japan.  :  , 

;  97  Genevieve  Bartlett,  John  Heidemann,  Christos  Papadopoulos.  Low-rate,  flow-level  periodicity  detection,  IEEE 
INFOCOM  201 1  -  IEEE  Conference  on  Computer  Communications  Workshops.  201 1/04/09  03:00:00, 
Shanghai,  China.  :  , 

;  96  Chris  Wilcox,  Christos  Papadopoulos,  John  Heidemann.  Correlating  Spam  Activity  with  IP  Address 
Characteristics,  IEEE  INFOCOM  2010  -  IEEE  Conference  on  Computer  Communications  Workshops. 
2010/03/14  04:00:00,  San  Diego,  CA,  USA.  :  , 

;  95  Genevieve  Bartlett,  John  Heidemann,  Christos  Papadopoulos.  Inherent  Behaviors  for  On-line  Detection  of 

Peer-to-Peer  File  Sharing,  2007  IEEE  Global  Internet  Symposium.  2007/05/10  03:00:00,  Anchorage,  AK,  USA. 


;  94  A.  Hussain,  J.  Heidemann,  C.  Papadopoulos.  Identification  of  Repeated  Denial  of  Service  Attacks, 

Proceedings  IEEE  INFOCOM  2006.  25TH  IEEE  International  Conference  on  Computer  Communications. 
2006/04/22  03:00:00,  Barcelona,  Spain.  :  , 

;  93  V.V.  Veeravalli,  J.A.  Fuemmeler.  Efficient  Tracking  in  a  Network  of  Sleepy  Sensors,  2006  IEEE  International 
Conference  on  Acoustics  Speed  and  Signal  Processing.  2006/07/24  03:00:00,  Toulouse,  France.  :  , 

;  92  Jason  A.  Fuemmeler,  Venugopal  V.  Veeravalli.  Sensor  scheduling  for  effective  and  energy  efficient  tracking  in 
sensor  networks,  2007  46th  IEEE  Conference  on  Decision  and  Control.  2007/12/11  03:00:00,  New  Orleans, 
LA,  USA.  :  , 

;  91  Vasanthan  Raghavan,  Venugopal  V.  Veeravalli.  Bayesian  quickest  change  process  detection,  2009  IEEE 
International  Symposium  on  Information  Theory  -  ISIT.  2009/06/27  03:00:00,  Seoul,  South  Korea.  :  , 

;  90  Yuping  Lin,  Qian  Yu,  Gerard  Medioni.  Map-Enhanced  UAV  Image  Sequence  Registration,  2007  IEEE 
Workshop  on  Applications  of  Computer  Vision  (WAC  V  '07).  2007/02/20  03:00:00,  Austin,  TX,  USA.  :  , 

89  Qian  Yu,  Gerard  Medioni.  Map-Enhanced  Detection  and  Tracking  from  a  Moving  Platform  with  Local  and 
Global  Data  Association,  2007  IEEE  Workshop  on  Motion  and  Video  Computing  (WMVC'07).  2007/02/22 
03:00:00,  Austin,  TX,  USA.  :  , 

88  Qian  Yu,  Gerard  Medioni,  Isaac  Cohen.  Multiple  Target  Tracking  Using  Spatio-Temporal  Markov  Chain  Monte 
Carlo  Data  Association,  2007  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition.  2007/06/16 
03:00:00,  Minneapolis,  MN,  USA.  :  , 

87  Qian  Yu,  Gerard  Medioni.  A  GPU-based  implementation  of  motion  detection  from  a  moving  platform,  2008 
IEEE  Computer  Society  Conference  on  Computer  Vision  and  Pattern  Recognition  Workshops  (CVPR 
Workshops).  2008/06/22  03:00:00,  Anchorage,  AK,  USA.  :  , 

86  Qian  Yu,  Gerard  Medioni.  Integrated  Detection  and  Tracking  for  Multiple  Moving  Objects  using  Data-Driven 
MCMC  Data  Association,  2008  IEEE  Workshop  on  Motion  and  video  Computing  (WMVC).  2008/01/07 
03:00:00,  Copper  Mountain,  CO,  USA.  :  , 

85  Qian  Yu,  Thang  Ba  Dinh,  Gerard  Medioni.  Online  Tracking  and  Reacquisition  Using  Co-trainedGenerative  and 
Discriminative  Trackers,  10th  European  Conference  on  Computer  Vision.  2008/01/01  03:00:00,  . :  , 

84  Thang  Dinh,  Qian  Yu,  Gerard  Medioni.  Real  time  tracking  using  an  active  pan-tilt-zoom  network  camera,  2009 
IEEE/RSJ  International  Conference  on  Intelligent  Robots  and  Systems  (IROS  2009).  2009/10/09  03:00:00,  St. 
Louis,  MO,  USA.  :  , 


2011/11/11  1  83  Qian  Yu,  G.  Medioni.  Motion  pattern  interpretation  and  detection  for  tracking  moving  vehicles  in  airborne  video, 
2009  IEEE  Computer  Society  Conference  on  Computer  Vision  and  Pattern  Recognition  Workshops  (CVPR 
Workshops).  2009/06/19  03:00:00,  Miami,  FL.  :  , 

2011/11/1 1  1  82  Thang  Ba  Dinh,  Gerard  Medioni.  Co-training  framework  of  generative  and  discriminative  trackers  with  partial 
occlusion  handling,  201 1  IEEE  Workshop  on  Applications  of  Computer  Vision  (WACV).  201 1/01/04  03:00:00, 
Kona,  HI,  USA.  :  , 

2011/11/1 1181  Thang  Ba  Dinh,  Nam  Vo,  Gerard  Medioni.  High  resolution  face  sequences  from  a  PTZ  network  camera, 
Gesture  Recognition  (FG  2011).  2011/03/20  03:00:00,  Santa  Barbara,  CA,  USA.  :  , 

2011/11/1 1  1  80  Thang  Ba  Dinh,  Nam  Vo,  Gerard  Medioni.  Context  tracker:  Exploring  supporters  and  distracters  in 

unconstrained  environments,  2011  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition  (CVPR). 
2011/06/19  03:00:00,  Colorado  Springs,  CO,  USA.  :  , 

2011/11/11  T  79  Aram  Galstyan,  Paul  Cohen.  Relational  Classification  Through  Three^State  Epidemic  Dynamics,  2006  9th 
International  Conference  on  Information  Fusion.  2006/07/09  03:00:00,  Florence.  :  , 

2011/11/1 1  1  78  Armen  Allahverdyan,  Aram  Galstyan.  On  maximum  a  posteriori  estimation  of  hidden  Markov  processes, 
Twenty-Fifth  Conference  on  Uncertainty  in  Artificial  Intelligence  .  2009/01/01  03:00:00,  . :  , 

2011/11/10  2:  77  Andrea  L.  Bertozzi,  Zhipu  Jin.  Environmental  boundary  tracking  and  estimation  using  multiple  autonomous 

vehicles,  2007  46th  IEEE  Conference  on  Decision  and  Control.  2007/12/11  03:00:00,  New  Orleans,  LA,  USA.  : 


2011/11/10  2:  76  Wangyi  Liu,  Martin  B.  Short,  Yasser  E.  Taima,  Andrea  L.  Bertozzi.  Multiscale  Collaborative  Searching  Through 
Swarming,  7th  International  Conference  on  Informatics  in  Control,  Automation,  and  Robotics.  2010/06/12 
03:00:00,  .  :  , 

2011/11/10  2:  75  Yao-Li  Chuang,  Yuan  R.  Huang,  Maria  R.  D'Orsogna,  Andrea  L.  Bertozzi.  Multi-Vehicle  Flocking:  Scalability  of 
Cooperative  Control  Algorithms  using  Pairwise  Potentials,  2007  IEEE  International  Conference  on  Robotics 
and  Automation.  2007/04/09  03:00:00,  Rome,  Italy.  :  , 

2011/11/10  2:  74  Kevin  K.  Leung,  Chung  H.  Hsieh,  Yuan  R.  Huang,  Abhijeet  Joshi,  Vlad  Voroninski,  Andrea  L.  Bertozzi.  A 
second  generation  micro-vehicle  testbed  for  cooperative  control  and  sensing  strategies,  2007  American 
Control  Conference.  2007/07/08  03:00:00,  New  York,  NY,  USA.  : , 

2011/11/10  2:  73  Y.  Landa,  D.  Galkowski,  Y.  R.  Huang,  A.  Joshi,  C.  Lee,  K.  K.  Leung,  G.  Malla,  J.  Treanor,  V.  Voroninski,  A.  L. 

Bertozzi,  R.  Tsai.  Robotic  path  planning  and  visibility  with  limited  sensor  data,  American  Control  Conference. 
2007/07/30  03:00:00,  .  :  , 

2011/11/10  2:  72  A.  Joshi,  T.  Ashley,  Y.  Huang,  A.L.  Bertozzi.  Experimental  validation  of  cooperative  environmental  boundary 
tracking  with  on-board  sensors,  American  Control  Conference.  2009/06/01  03:00:00,  . :  , 

2011/11/10  2:  71  J.H.  von  Brecht,  S.  Thiruvenadam,  T.F.  Chan.  OCCLUSION  TRACKING  USING  LOGIC  MODELS,  9th  IASTED 
Conference  on  Signal  and  Image  Processing.  2007/01/01  03:00:00,  .  : , 

2011/11/10  2‘.  70  M.  Gonzalez,  X.  Huang,  B.  Irvine,  D.  S.  Hermina  Martinez,  C.  H.  Hsieh,  Y.  R.  Huang,  M.  B.  Short,  A.  L. 

Bertozzi.  A  Third  Generation  Micro-vehicle  Testbed  for  Cooperative  Control  and  Sensing  Strategies,  8th 
International  Conference  on  Informatics  in  Control,  Automation  and  Robotics  (ICINCO).  2011/01/01  03:00:00,  . 


2011/11/10  2‘.  69  Alexander  G.  Tartakovsky,  H.  Kim.  Performance  of  Certain  Decentralized  Distributed  Change  Detection 
Procedures,  9th  International  Conference  on  Information  Fusion.  2006/07/10  03:00:00,  .  : , 

2011/11/10  2:  68  Alexander  G.  Tartakovsky,  Aleksey  S.  Polunchenko.  Decentralized  Quickest  Change  Detection  in  Distributed 
Sensor  Systems  with  Applications  to  Information  Assurance  and  Counter  Terrorism,  13th  Annual  Army 
Conference  on  Applied  Statistics.  2007/10/18  03:00:00,  .  : , 

2011/11/10  2:  67  Moshe  Poliak,  Alexander  G.  Tartakovsky,  Aleksey  S.  Polunchenko.  Asymptotic  Exponentiality  of  First  Exit 
Timesfor  Recurrent  Markov  Processes  andApplications  to  Changepoint  Detection,  2008  International 
Workshop  on  Applied  Probability.  2008/07/07  03:00:00,  .  :  , 


2011/11/10  2:  66  Alexander  G.  Tartakovsky,  Aleksey  S.  Polunchenko.  Quickest  Changepoint  Detection  in  Distributed 

Multisensor  Systems  under  Unknown  Parameters,  11th  International  Conference  on  Information  Fusion. 
2008/07/01  03:00:00,  .  :  , 

2011/11/10  2:  65  Alexander  G.  Tartakovsky,  James  Brown,  Andrew  Brown.  Nonstationary  EO/IR  Clutter  Suppression  and  Dim 
Object  Tracking,  2010  Advanced  Maui  Optical  and  Space  Surveillance  Technologies  (AMOS)  Conference. 
2010/09/20  03:00:00,  .  :  , 

2011/11/10  2:  64  Georger  V.  Moustakides,  Alexander  G.  Tartakovsky,  Aleksey  S.  Polunchenko.  Design  and  Comparison  of 

Shiryaev-Roberts  and  CUSUM-TypeChange-Point  Detection  Procedures,  The  Second  International  Workshop 
on  Sequential  Methodologies  .  2009/06/16  03:00:00, .  :  , 

2011/11/10  2:  63  Nitis  Mukhopadhyay,  Alexander  Tartakovsky,  Aleksey  Polunchenko.  Nearly  Optimal  Change-Point  Detection 
withAn  Application  to  Cybersecurity,  The  Third  International  Workshopin  Sequential  Methodologies. 
2011/06/15  03:00:00,  .  :  , 

2011/11/10  2:  62  Moshe  Poliak,  Alexander  G.  Tartakovsky.  Nearly  Minimax  Changepoint  Detection  Procedures,  IEEE 
International  Symposium  on  Information  Theory.  2011/08/01  03:00:00, .  :  , 

2011/11/10  2:  61  Alexander  G.  Tartakovsky,  Moshe  Poliak.  Monotone  Properties  of  the  First  Exit  Time  of  a  MarkovProcess 
Started  at  a  Quasi-stationary  Distribution,  Markov  and  Semi-Markov  Processes  and  Related  Fields  2011. 
2011/09/20  03:00:00,  .  :  , 

2011/11/09  T  54  Qian  Yu,  Gerard  Medioni,  Isaac  Cohen.  Multiple  Target  Tracking  Using  Spatio-Temporal  Markov  Chain  Monte 
Carlo  Data  Association,  2007  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition.  2007/06/16 
03:00:00,  Minneapolis,  MN,  USA.  :  , 

TOTAL:  39 

Number  of  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 

(d)  Manuscripts 

Received  Paper 

2011/11/10  II  7  Moshe  Poliak,  Alexander  G.  Tartakovsky.  On  the  first  exit  time  of  a  nonnegative  Markov  process  started  at  a 
quasistationary  distribution,  Journal  of  Applied  Probability  ( ) 

2011/11/09  i:  33  Sergey  V.  Lototsky,  Boris  L.  Rozovsky.  Stochastic  Parabolic  Equations  of  Full  Second  Order,  ( ) 

2011/11/02  II  3  Alexander  G.  Tartakovsky,  Moshe  Poliak,  Aleksey  S.  Polunchenko.  Third-order  Asymptotic  Optimality  of  the 
Generalized  Shiryaev-Roberts  Detection  Procedures,  Theory  of  Probability  Applications  ( ) 

2011/11/02  V  2  Georgios  Fellouris,  Alexander  G.  Tartakovsky.  Nearly  Minimax  Mixture  Rules  for  One-sided  Sequential 
Testing,  Sequential  Analysis  ( ) 

TOTAL:  4 

Number  of  Manuscripts: 

Books 

Received  Paper 

201 1/11/09  1  53  Boris  Rozovsky,  Dan  Crisan.  the  oxford  handbook  of  nonlinear  filtering,  United  Kingdom:  Oxford  University 
Place,  (01  2011) 


TOTAL:  1 


Patents  Submitted 


1.  G.  Medioni  and  Q.  Yu,  USC  File  No:4048  “Spatio-Temporal  Multiple  Target  Tracking  Using  Markov  Chain  Monte 

Carlo  Data  Association". - 

2.  G.  Medioni  and  Q.  Yu,  USC  File  No:4109  “Online  Tracking  Using  Co-trained  Generative  and  Discriminative  Trackers”. 

3.  G.  Medioni  and  T.  B.  Dinh,  USC  File  No:  11-671,  “Visual  Tracking  in  Video  Images  in  Unconstrained  Environments  by 
Exploiting  On-The-Fly  Context  Using  Distracters  and  Supporters”. 

4.  Provisional  patent  application  filed  by  UCLA  with  US  Patent  Office  titled  “Data  Fusion  Mapping  Estimation”. 

Patents  Awarded 


Awards 

1.  T.B.  Dinh,  Travel  Grant  to  participate  in  the  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition  (CVPR), 
Colorado  Springs,  Colorado,  June  20-25,  2011. 

2.  T.B.  Dinh,  Travel  Grant  from  NSF  to  participate  n  IEEE  Conference  on  Automatic  Face  and  Gesture  Recognition  (FG), 
Santa  Barbara,  California,  March  21-25,  2011. 

3.  A.S.  Polunchenko,  Institute  of  Mathematical  Statistics’  (IMS)  Laha  Travel  Award,  2011. 

4.  Laura  Smith,  UCLA  Dissertation  Year  Fellowship,  2011. 

5.  Nancy  Rodriguez,  National  Science  Foundation  Postdoctoral  Fellowship,  2011. 

6.  Andrea  Bertozzi,  Elected  American  Academy  of  Arts  and  Sciences,  2010. 

7.  Andrea  Bertozzi,  Elected  SIAM  Fellow,  2010. 

8.  Tony  Chan,  Elected  SIAM  Fellow,  2010. 

9.  V.  Veeravalli  was  appointed  IEEE  Signal  Processing  Society  Distinguished  Lecturer  for  2010-2011. 

10.  Andrea  Bertozzi,  Sonia  Kovalevsky  Prize  Lecture,  SIAM  Annual  Meeting,  2009. 

11.  P.  Jeffrey  Brantingham,  (2009-Present)  Los  Angeles  Police  Department,  Community  Police  Advisory  Board  on 
Counter-Terrorism,  Appointed  Board  Member. 

12.  T.B.  Dinh,  1st  runner-up  presentation  award  in  scientific  sessions  of  Annual  Vietnam  Education  Foundation,  Albany, 
NY,  Jan  3-5,  2009. 

13.  Vlad  Voroninsky,  NSF  Graduate  Fellowship,  2008. 

14.  Tony  Chan,  Elected  AAAS  Fellow,  2007. 

15.  Alexander  Tartakovsky,  Abraham  Wald  Prize  in  Sequential  Analysis,  2007. 


Graduate  Students 


NAME 

PERCENT  SUPPORTED  Discipline 

Greg  Sokolov 

0.50 

Thang  Ba  Dinh 

0.25 

Jan  Prokaj 

0.25 

Yoon  Sik  Cho 

0.50 

Ming  Ji 

0.25 

Alexander  Chen 

0.25 

Nancy  Rodriguez 

0.25 

Paul  Jones 

0.25 

Matthew  Keegan 

0.25 

Laura  Smith 

0.25 

Erik  Lewis 

0.25 

Rachel  Hegemann 

0.25 

Wangyi  Liu 

0.25 

Yasser  Taima 

0.25 

Kevin  Shen 

0.25 

David  Hermina 

0.25 

James  von  Brecht 

0.25 

Jason  Fuemmeler 

0.50 

Taposh  Banerjee 

0.25 

Steven  DiBenedetto 

0.25 

Kaustubh  Gadkari 

0.25 

Mengran  Hu 

0.25 

Zhang  Han 

0.25 

Andrea  Steiner 

0.25 

Andrew  Papanicolaou 

0.50 

C.-Y.  Lee 

0.25 

FTE  Equivalent: 

7.50 

Total  Number: 

26 

Names  of  Post  Doctorates 


NAME 

PERCENT  SUPPORTED 

J.  Park 

0.25 

Greg  Ver  Steeg 

0.50 

Aleksey  Polunchenko 

0.50 

Vasanthan  Raghavan 

0.10 

Georgios  Fellouris 

0.10 

Berta  Sandberg 

0.25 

Virginia  Pasour 

0.25 

Zhipu  Jin 

0.25 

FTE  Equivalent: 

2.20 

Total  Number: 

8 

Names  of  Faculty  Supported 


NAME 

PERCENT  SUPPORTED  National  Academy  Member 

Boris  Rozovsky 

0.10 

Aram  Galstyan 

0.25 

Paul  Cohen 

0.10 

Gerard  Medioni 

0.10 

Alexander  Tartakovsky 

0.30 

Andrea  Bertozzi 

0.10 

Tony  Chan 

0.10 

P.  Jeffrey  Brantingham 

0.10 

Venugopal  Veeravalli 

0.20 

Christos  Papadopoulos 

0.10 

FTE  Equivalent: 

1.45 

Total  Number: 

10 

Names  of  Under  Graduate  students  supported 


NAME 

PERCENT  SUPPORTED 

Discipline 

Marina  Masaki 

0.25 

Mathematics 

Kym  Louie 

0.25 

Mathematics 

Mark  Allenby 

0.25 

Mathematics 

Mike  Egesdal 

0.25 

Mathematics 

Chris  Fathauer 

0.25 

Mathematics 

Jeremy  Neumann 

0.25 

Mathematics 

Benjamin  Irvine 

0.25 

Mathematics 

Max  Gonzales 

0.25 

Mathematics 

Edwin  Huang 

0.25 

Mathematics 

Abhijeet  Joshi 

0.25 

Mathematics 

Kevin  Leung 

0.25 

Mathematics 

Vlad  Voroninsky 

0.25 

Mathematics 

Trevor  Ashley 

0.25 

Mathematics 

Andrea  Steiner 

0.25 

Computer  and  Computational  Sciences 

FTE  Equivalent: 

3.50 

Total  Number: 

14 

Student  Metrics 

This  section  only  applies  to  graduating  undergraduates  supported  by  this  agreement  in  this  reporting  period 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period: .  1 2.00 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period  with  a  degree  in 

science,  mathematics,  engineering,  or  technology  fields: .  12.00 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  will  continue 

to  pursue  a  graduate  or  Ph.D.  degree  in  science,  mathematics,  engineering,  or  technology  fields: .  1 0-00 

Number  of  graduating  undergraduates  who  achieved  a  3.5  GPA  to  4.0  (4.0  max  scale): .  12.00 

Number  of  graduating  undergraduates  funded  by  a  DoD  funded  Center  of  Excellence  grant  for 

Education,  Research  and  Engineering: .  o.OO 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  intend  to 

work  for  the  Department  of  Defense .  0.00 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  will  receive 

scholarships  or  fellowships  for  further  studies  in  science,  mathematics,  engineering  or  technology  fields: .  0.00 


Names  of  Personnel  receiving  masters  degrees 


NAME 

Kevin  Shen 

David  Hermina-Martinez 

C.  Y.  Lee 

Total  Number:  3 

Names  of  personnel  receiving  PHDs 

NAME 

Aleksey  Polunchenko 

Andrew  Papanicolaou 

Qian  Yu 

Alex  Chen 

Nancy  Rodriguez 

Paul  Jones 

Jason  Fuemmeler 

C.  Y.  Lee 

Dinh  Ba  Thang 

Total  Number:  9 

Names  of  other  research  staff 

NAME  PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Sub  Contractors  (DD882) 


Inventions  (DD882) 


Scientific  Progress 


See  the  attached  file 


Technology  Transfer 


SPATIO-TEMPORAL  NONLINEAR  LILTERING  WITH  APPLICATIONS 
TO  INFORMATION  ASSURANCE  AND  COUNTER  TERRORISM 
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tracking  without  a  change-point  detection  modification.  Middle:  Boundary  tracking  with 
the  CUSUM  algorithm.  Right:  Threshold  dynamics  -  global  segmentation  method . Ill 

6. 1  Time-rolled  diagram  of  an  Event-Coupled  Factorial  HMM . 113 

6.2  Correlation  between  ACU/ADA  scores  and  inferred  probabilities . 118 

6.3  Comparison  of  inference  results  with  ACU  and  ADA  scores:  Sen.  Specter  (top)  and  Sen. 

Dole  (bottom) . 118 

6.4  Polarization  trends  during  97th-104th  US  Congresses . 119 

6.5  MAP  characteristics  versus  the  noise  intensity  in  the  regimes  m  =  1,2,3  for  q  =  0.24: 

(a)  Overlap  (b)  Entropy  In  (a)  the  open  squares  represent  simulation  results,  obtained 
by  running  the  Viterbi  algorithm  and  calculating  the  respective  quantities  directly.  We  used 
sequences  of  size  104,  and  averaged  the  results  over  100  random  trials . 121 

6.6  (a)  Magnetization  m  vs.  a  for  different  p.  (b)  Detectable-non  detectable  boundary  for 

different  p . 125 

6.7  (a)  Magnetization  plotted  against  a  for  different  p.  Lines  are  generated  from  population 

dynamics  and  points  are  generated  from  simulated  annealing.  From  bottom  to  top  we  have 
p  =  0,  0.5, 1,  2.  (b)  Location  of  the  m  =  0  threshold  on  the  (a,  7)  plane.  Dashed  line 
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7.2  (left)  Cover  of  March  2,  2010  issue  of  PNAS.  (right)  Crime  hotspot  suppression  figure  from 

the  cover  article  [151].  Suppression  results  for  the  PDE  system  with  parameters  chosen  to 
generate  supercritical  or  subcritical  crime  hotspots.  (A)  Suppression  of  supercritical  crime 
hotspots.  Shown  is  the  configuration  of  supercritical  hotspots  at  timestep  t  =  100,  just  prior 
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data,  (a)  and  (b)  show  the  results  of  the  current  methods  Kernel  Density  Estimation  and  TV 
MPLE,  respectively.  The  results  from  our  Modified  TV  MPLE  method  and  ourWeighted  HI 
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of  the  IEEE  International  Symposium  on  Information  Theory,  St.  Petersburg,  Russia,  July  31  -  August 
5,2011. 

3.  A.G.  Tartakovsky  and  A.S.  Polunchenko,  “Minimax  Optimality  the  Shiryaev-Roberts  Procedure,” 
Proceedings  of  the  5th  International  Workshop  in  Applied  Probability,  Universidad  Carlos  III  de 
Madrid,  Colmenarejo  Campus,  Spain,  5-8  July  2010  (Invited). 

4.  A.G.  Tartakovsky,  A.S.  Polunchenko,  and  G.V.  Moustakides,  “Design  and  Comparison  of  Shiryaev- 
Roberts-  and  CUSUM-Type  Change-Point  Detection  Procedures,”  Proceedings  of  the  2nd  Interna¬ 
tional  Workshop  in  Sequential  Methodologies,  University  of  Technology  of  Troyes,  Troyes,  France, 
15-17  June  2009  (Invited). 
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5.  A.G.  Tartakovsky,  A.P.  Brown,  and  J.  Brown,  “Enhanced  Algorithms  for  EO/IR  Electronic  Stabiliza¬ 
tion,  Clutter  Suppression,  and  Track-Before-Detect  for  Multiple  Low  Observable  Targets,”  The  10th 
Advanced  Maui  Optical  and  Space  Surveillance  Technologies  Conference,  Maui,  Hawaii,  September 
2009. 

6.  A.G.  Tartakovsky  and  A.S.  Polunchenko,  “Quickest  Changepoint  Detection  in  Distributed  Multisen¬ 
sor  Systems  under  Unknown  Parameters,”  Proceedings  of  the  11th  International  Conference  on  Infor¬ 
mation  Fusion,  Hyatt  Regency  Hotel,  Cologne,  Germany,  2008,  pp.  878-885  (Invited). 

7.  A.G.  Tartakovsky,  M.  Poliak,  and  A.S.  Polunchenko,  “Asymptotic  Exponentiality  of  First  Exit  Times 
for  Recurrent  Markov  Processes  and  Applications  to  Changepoint  Detection,”  Proceedings  of  the  2008 
International  Workshop  on  Applied  Probability,  Compiegne,  France,  7-10  July  2008  (Invited). 

8.  A.G.  Tartakovsky  and  H.  Kim,  “Performance  of  Certain  Decentralized  Distributed  Change  Detection 
Procedures,”  Proceedings  of  the  9th  International  Conference  on  Information  Fusion,  Florence,  Italy, 
2006,  CD  ISBN  0-9721844-6-5,  IEEE  Catalog  No.  06EX131 1C  (Invited). 

9.  A.G.  Tartakovsky  and  A.S.  Polunchenko,  “Decentralized  Quickest  Change  Detection  in  Distributed 
Sensor  Systems  With  Applications  to  Information  Assurance  and  Counter  Terrorism,”  Proceedings  of 
the  13th  Annual  Army  Conference  on  Applied  Statistics,  Rice  University,  Houston,  TX,  17-19  October 
2007  (Invited). 

10.  P.J.  Brantingham  and  M.B.  Short,  “Crime  Emergence,”  In  When  Crime  Appears:  The  Role  of  Emer¬ 
gence,  edited  by  J.M.  McGloin,  C.  Sullivan  and  L.W.  Kennedy.  New  York:  Routledge  (in  press). 

11.  M.  Gonzalez,  X.  Huang,  B.  Irvine,  D.  S.  Hermina  Martinez,  C.  H.  Hsieh,  Y.  R.  Huang,  M.  B.  Short, 
and  A.  L.  Bertozzi,  “A  Third  Generation  Micro-vehicle  Testbed  for  Cooperative  Control  and  Sensing 
Strategies,”  Proceedings  of  the  8th  International  Conference  on  Informatics  in  Control,  Automation 
and  Robotics  (ICINCO),  pp.  14-20,  2011. 

12.  Wangyi  Liu,  Martin  B.  Short,  Yasser  E.  Taima,  and  Andrea  L.  Bertozzi,  “Multiscale  Collaborative 
Searching  Through  Swarming,”  Proceedings  of  the  7th  International  Conference  on  Informatics  in 
Control,  Automation,  and  Robotics  (ICINCO),  Portugal,  June  2010. 

13.  J.H.  von  Brecht,  S.  Thiruvenkadam  and  T.F.  Chan,  “Occlusion  Tracking  Using  Logic  Models,”  9th 
IASTED  Conf  on  Signal  and  Image  Processing,  2007. 

14.  A.  Joshi,  T.  Ashley,  Y.  Huang,  and  A.  L.  Bertozzi,  “Experimental  validation  of  cooperative  environ¬ 
mental  boundary  tracking  with  on-board  sensors,”  American  Control  Conference,  St.  Louis,  MO,  June 
2009,  pp.  2630-2635. 

15.  Z.  Jin  and  A.  L.  Bertozzi,  “Environmental  Boundary  Tracking  and  Estimation  using  Multiple  Au¬ 
tonomous  Vehicles,”  Proceedings  of  the  46th  IEEE  Conference  on  Decision  and  Control,  New  Or¬ 
leans,  LA,  2007,  pp.  4918-4923. 

16.  Y.  Landa,  D.  Galkowski,  Y.  R.  Huang,  A.  Joshi,  C.  Lee,  K.  K.  Leung,  G.  Malla,  J.  Treanor,  V.  Voronin- 
ski,  A.  L.  Bertozzi,  and  R.  Tsai,  “Robotic  path  planning  and  visibility  with  limited  sensor  data,” 
Proceedings  of  the  2007  American  Control  Conference. 

17.  Kevin  K.  Leung,  Chung  H.  Hsieh,  Yuan  R.  Huang,  Abhijeet  Joshi,  Vlad  Voroninski,  and  Andrea  L. 
Bertozzi,  “A  second  generation  micro-vehicle  testbed  for  cooperative  control  and  sensing  strategies,” 
Proceedings  of  the  2007  American  Control  Conference,  pp.  1900-1907. 

18.  Y.-L.  Chuang,  Y.  R.  Huang,  M.  R.  D'Orsogna,  and  A.  L.  Bertozzi,  “Multi-vehicle  flocking:  scala¬ 
bility  of  cooperative  control  algorithms  using  pairwise  potentials,”  IEEE  International  Conference  on 
Robotics  and  Automation,  2007,  pp.  2292-2299. 

19.  A.  Allahverdyan  and  A.  Galstyan,  “Comparative  Analysis  of  Viterbi  Training  and  ML  Estimation  for 
HMMs,”  In  Neural  Information  Processing  Systems  (NIPS),  2011. 

20.  Y.S.Cho,  G.  Ver  Steeg,  and  A.  Galstyan,  “Co-evolution  of  Selection  and  Influence  in  Social  Net¬ 
works,”  In  Proc.  of  the  Twenty-Fifth  Conference  on  Artificial  Intelligence  (AAAI-11),  201 1. 

21.  A.  Allahverdyan  and  A.  Galstyan,  “On  Maximum  a  Posteriori  Estimation  of  Hidden  Markov  Pro¬ 
cesses,”  In  Proc.  of  the  25th  Conference  on  Uncertainty  in  Artificial  Intelligence  ( UAI-09),  Montreal, 
Canada,  2009. 


21 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


22.  A.  Galstyan  and  P.R.  Cohen,  “Relational  Classification  Through  Three-State  Epidemic  Dynamics,”  In 
Proceedings  of  the  9th  International  Conference  on  Information  Fusion  (FUSION’06),  special  session 
on  Making  Histories,  Florence,  Italy,  2006. 

23.  A.  Galstyan  and  RR.  Cohen,  “Empirical  Comparison  of  “Hard”  and  “Soft”  Label  Propagation  for 
Relational  Classification,”  In  Proceedings  of  the  International  Conference  on  Inductive  Logic  Pro¬ 
gramming  (ILP-07),  Corvallis,  OR,  2007. 

24.  T.  B.  Dinh,  N.  Vo,  and  G.  Medioni,  “Context  Tracker:  Exploring  Supporters  and  Distracters  in  Un¬ 
constrained  E n v  i  ro n  me  nts,” IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition  ( CVPR ), 
Colorado  Springs  CO,  Jun  20-25  2011. 

25.  T.  B.  Dinh,  N.  Vo,  and  G.  Medioni,  “High  Resolution  Face  Sequences  from  a  PTZ  Network  Cam- 
era flEEE  Conference  on  Automatic  Face  and  Gesture  Recognition  (FG),  Santa  Barbara  CA,  Mar 
21-25  2011. 

26.  T.  B.  Dinh  and  G.  Medioni,  “Co-training  Framework  of  Generative  and  Discriminative  Trackers 
with  Partial  Occlusion  Handling,”//:/:/:  Workshop  on  Motion  and  Video  Computing  (WMVC),  Kona 
Hawaii,  Jan  5-7  2011. 

27.  Q.  Yu  and  G.  Medioni,  “Motion  Pattern  Interpretation  and  Detection  for  Tracking  Moving  Vehicles 
in  Airborne  Videos,”  IEEE  Conference  on  Computer  Vision  and  Pattern  Recognition  (CVPR),  Miami 
FL,  Jun  20-25  2009. 

28.  T.  Dinh,  Q.  Yu,  and  G.  Medioni,  “Real  Time  Tracking  Using  an  Active  Pan-Tilt-Zoom  Network 
Camera,”  IEEE/RSJ  International  Conference  on  Intelligent  Robots  and  Systems  (IROS),  St.  Louis 
MO,  Oct  11-15  2009. 

29.  Q.  Yu,  T.  Dinh,  and  G.  Medioni,  “Online  Tracking  and  Reacquisition  Using  Co-trained  Generative 
and  Discriminative  Trackers,”  European  Conference  on  Computer  Vision  (ECCV),  Marseille,  France, 
Oct  12-18,  2008. 

30.  Q.  Yu,  G.  Medioni,  “Integrated  Detection  and  Tracking  for  Multiple  Moving  Objects  using  Data- 
Driven  MCMC  Data  Association,”//:/:/:  Workshop  on  Motion  and  Video  Computing  (WMVC),  Copper 
Mountain,  CO,  Jan  08-09  2008. 

31.  Q.  Yu,  G.  Medioni.  “A  GPU-based  implementation  of  Motion  Detection  from  a  Moving  Platform,' "IEEE 
Workshop  on  Computer  Vision  on  GPU  ( CVGPU),  Anchorage  AK,  Jun  23-28  2008. 

32.  Q.  Yu,  G.  Medioni,  and  I.  Cohen,  “Multiple  Target  Tracking  Using  Spatio-Temporal  Monte  Carlo 
Markov  Chain  Data  Association,”  IEEE  Computer  Society  Conference  on  Computer  Vision  and  Pat¬ 
tern  Recognition  (CVPR) ,  Minneapolis  MN,  Jun  18-23  2007. 

33.  Q.  Yu  and  G.  Medioni,  “Map-Enhanced  Detection  and  Tracking  from  a  Moving  Platform  with  Local 
and  Global  Data  Association,”  IEEE  Workshop  on  Motion  and  Video  Computing  (WMVC),  pp.  3-10, 
Austin  TX,  Feb  23-24  2007. 

34.  Y.  Lin,  Q.  Yu,  and  G.  Medioni,  “Map-Enhanced  UAV  Image  Sequence  Registration,”  IEEE  Workshop 
on  Applications  of  Computer  Vision  (WACV),  pp.  15-20,  Austin  TX,  Feb  23-24  2007. 

35.  T.  Banerjee  and  V.V.  Veeravalli,  “Bayesian  Quickest  Change  Detection  Under  Energy  Constraints,”  In 
Proc.  ITA  workshop,  UCSD,  San  Diego,  CA,  February  2011  (Invited). 

36.  G.  Atia  and  V.V.  Veeravalli,  “Sensor  management  for  energy-efficient  tracking  in  cluttered  environ¬ 
ments,”  In  Proc.  ITA  workshop,  UCSD,  San  Diego,  CA,  February  201 1  (Invited). 

37.  G.K.  Atia,  V.V.  Veeravalli  and  J.A.  Fuemmeler,  “Sensor  scheduling  for  energy-efficient  target  tracking 
in  sensor  networks,”  In  Proc.  IEEE  Asilomar  Conference  on  Signals,  Systems  and  Computers,  Pacific 
Grove,  CA,  November  2010. 

38.  K.  Premkumar,  A.  Kumar  and  V.V.  Veeravalli,  “Bayesian  Quickest  Transient  Change  Detection,”  In 
Proc.  International  Workshop  on  Applied  Probability,  Madrid,  Spain,  July  2010  (Invited). 

39.  V.  Ragahavan  and  V.V.  Veeravalli,  “Bayesian  quickest  change  process  detection,”  In  Proc.  IEEE  ISIT, 
Seoul,  South  Korea,  August  2009. 
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40.  V.  Raghavan  and  V.V.  Veeravalli,  “Quickest  Detection  of  a  Change  Process  Across  a  Sensor  Array,” 
In  Proc.  IEEE  Fusion ,  Cologne,  Germany,  July  2008  (Invited). 

41.  J.  Fuemmeler  and  V.V.  Veeravalli,  “Sensor  Scheduling  for  Effective  and  Energy  Efficient  Tracking  in 
Sensor  Networks,”  In  Proc.  IEEE  CDC.  New  Orleans,  LA,  December  2007  (Invited). 

42.  V.V.  Veeravalli  and  J.  Fuemmeler,  “Joint  Optimization  of  Smart  Sleeping  and  Cooperative  Local¬ 
ization  Strategies  for  Energy-Efficient  Tracking  in  Sensor  Networks,”  In  Proc.  56th  Session  of  the 
International  Statistical  Institute  (ISI),  Lisbon,  Portugal,  August  2007  (Invited). 

43.  J.  Fuemmeler  and  V.V.  Veeravalli,  “Smart  Sleeping  Strategies  for  Localization  and  Tracking  in  Sensor 
Networks,”  In  Proc.  40th  Asilomar  Conference  on  Signals,  Systems,  and  Computers,  Monterey,  CA, 
November  2006  (Invited). 

44.  V.V.  Veeravalli  and  J.  Fuemmeler,  “Efficient  Tracking  in  a  Network  of  Sleepy  Sensors,”  In  Proc.  IEEE 
ICASSP,  Toulouse,  France,  May  2006  (Invited). 

45.  G.  Bartlett,  J.  Heidemann,  and  C.  Papadopoulos,  “Understanding  Passive  and  Active  Service  Dis¬ 
covery,”  In  Proceedings  of  the  ACM  Internet  Measurement  Conference,  San  Diego,  California,  USA, 
October,  2007. 

46.  A.  Hussain,  J.  Heidemann,  and  C.  Papadopoulos,  “Identification  of  Repeated  Denial  of  Service  At¬ 
tacks,”  In  Proceedings  of  the  IEEE  Infocom,  Barcelona,  Spain,  April,  2006. 

47.  G.  Bartlett,  J.  Heidemann,  and  C.  Papadopoulos,  “Inherent  Behaviors  for  On-line  Detection  of  Peer- 
to-Peer  File  Sharing,”  In  Proceedings  of  the  10th  IEEE  Global  Internet,  Anchorage,  Alaska,  USA, 
May  2007. 

48.  J.  Heidemann,  Y.  Pradkin,  R.  Govindan,  C.  Papadopoulos,  G.  Bartlett,  and  J.  Bannister,  “Census  and 
Survey  of  the  Visible  Internet,”  In  Proceedings  of  the  ACM  Internet  Measurement  Conference,  p. 
169-182.  Vouliagmeni,  Greece,  October  2008. 

49.  C.  Wilcox,  C.  Papadopoulos,  and  J.  Heidemann,  “Correlating  Spam  Activity  with  IP  Address  Char¬ 
acteristics”,  In  Proceedings  of  the  IEEE  Global  Internet  Symposium,  San  Diego,  California,  USA, 
March  2010. 

50.  G.  Bartlett,  J.  Heidemann,  and  C.  Papadopoulos,  “Low-Rate,  Flow-Level  Periodicity  Detection,”  In 
Proceedings  of  the  14th  IEEE  Global  Internet  Symposium,  Shanghai,  China,  April  2011. 

51.  S.  DiBenedetto,  K.  Gadkari,  N.  Diel,  A.  Steiner,  D.  Massey,  and  C.  Papadopoulos,  “Fingerprint¬ 
ing  Custom  Botnet  Protocol  Stacks,”  Proceedings  of  the  6th  Workshop  on  Secure  Network  Protocols 
(NPSec),  Japan,  Kyoto,  October  2010. 

(e)  Papers  presented  at  meetings,  but  not  published  in  conference  proceedings: 

1.  A.G.  Tartakovsky,  “Spatiotemporal  Image  Processing  with  Applications  to  Remote  Sensing,”  Depart¬ 
ment  of  Statistics  and  Department  of  Computer  Sciences,  University  of  Chicago,  September,  2011 
(Invited). 

2.  A.G.  Tartakovsky,  “Sequential  Changepoint  Detection:  State  of  the  Art,”  Department  of  Statistics, 
University  of  Illinois  at  Urbana-Champaign,  October,  2011  (Invited). 

3.  A.G.  Tartakovsky,  “Spatiotemporal  Image  Processing  with  Applications  to  Remote  Sensing,”  Depart¬ 
ment  of  Electrical  and  Computer  Engineering  and  Coordinated  Science  Lab,  University  of  Illinois  at 
Urbana-Champaign,  October,  2011  (Invited). 

4.  A.S.  Polunchenko,  A.G.  Tartakovsky,  and  N.  Mukhopadhyay,  “Nearly  Optimal  Change-Point  Detec¬ 
tion  with  An  Application  to  Cybersecurity,”  The  3rd  International  Workshop  in  Sequential  Method¬ 
ologies,  Stanford  University,  14-16  June  2011. 

5.  A.G.  Tartakovsky  and  A.S.  Polunchenko,  “Optimality  of  the  Shiryaev-Roberts  Procedure  for  Detect¬ 
ing  Changes  in  Distributions,”  The  73rd  Annual  Meeting  of  the  Institute  of  Mathematical  Statistics, 
Gothenburg,  Sweden,  9-13  August  2010. 

6.  A.P  Brown  and  A.G.  Tartakovsky,  “Spatiotemporal  Clutter  Rejection  and  Track-Before-Detect  Meth¬ 
ods  for  Tracking  Small  Dim  Objects,”  32nd  Review  of  Atmospheric  Transmission  Models  Meeting, 
Lexington,  Massachusetts,  14-15  June  2010  (Invited). 
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7.  A.G.  Tartakovsky,  “Spatial-Temporal  Image  Processing  Techniques  and  Applications  to  Remote  Sens¬ 
ing,”  Department  of  Mathematics,  Stanford  University,  2009  (Invited). 

8.  A.G.  Tartakovsky,  “Quickest  Changepoint  Detection:  Recent  Advances  and  Open  Problems,”  NSF 
Sponsored  Workshop  in  Honor  of  Professor  A.V.  Balakrishnan ,  January  30,  2009  (Invited). 

9.  A.G.  Tartakovsky,  “Efficient  Numerical  Methods  for  Optimization  and  Performance  Evaluation  of 
Changepoint  Detection  Procedures,”  Department  of  Probability,  Moscow  State  University,  Moscow, 
Russia,  March  25,  2009  (Invited). 

10.  A.G.  Tartakovsky,  “Adaptive  Spatial-Temporal  Image  Processing  Techniques  and  Applications  to 
Clutter  Rejection  in  Remote  Sensing,”  Workshop  “ Spatiotemporal  Image  Processing  and  Visual  Surveil¬ 
lance  ”,  University  of  Southern  California,  2008  (Invited). 

11.  A.G.  Tartakovsky,  “Exact  Optimality  of  the  Shiryaev-Roberts  Procedure  for  Detecting  Changes  in 
Distributions,”  Department  of  Mathematical  Sciences,  University  of  Technology,  Sydney,  Australia, 
November  27,  2008  (Invited). 

12.  A.G.  Tartakovsky,  “Detection  and  Classification  in  Distributed  Multisensor  Systems  with  Applica¬ 
tions  to  Network  Security,”  Workshop  “Sensor  Networks  and  Future  Internet  Security”,  University  of 
Southern  California,  May  23,  2007  (Invited). 

13.  A.G.  Tartakovsky,  “Asymptotic  Optimality  in  Sequential  Quickest  Change-Point  Detection:  Theory 
and  Applications,”  Princeton,  September  25,  2007  (Invited). 

14.  A.G.  Tartakovsky,  “Quickest  Change-Point  Detection:  Previous  Achievements  and  Open  Problems,” 
First  International  Workshop  on  Sequential  Methodologies ,  Auburn,  AL,  22-25  July  2007  (Invited). 

15.  A.G.  Tartakovsky,  “Asymptotic  Optimality  in  Sequential  Hypothesis  Testing  and  Quickest  Change- 
Point  Detection  for  General  Continuous-Time  Stochastic  Processes,”  Workshop  on  Inverse  Problems 
in  Stochastic  Differential  Equations,  University  of  Southern  California,  Los  Angeles,  CA,  22-26  May, 
2007  (Invited). 

16.  A.G.  Tartakovsky,  “An  Asymptotically  Optimal  Change  Detection  Strategy  Under  Nontraditional 
Global  False  Alarm  Probability  Constraint,”  The  2007  Taipei  International  Statistical  Symposium  and 
ICSA  International  Conference  (Session:  Change-Point  Analysis  and  Applications),  Taipei,  Taiwan, 
24-28  June,  2007  (Invited). 

17.  V.V.  Veeravalli  and  A.G.  Tartakovsky,  “Quickest  Change  Detection  in  Sensor  Networks,”  First  Inter¬ 
national  Workshop  on  Sequential  Methodologies,  Auburn,  AL,  July  2007  (Invited). 

18.  B.L.  Rozovsky,  “Generalized  Malliavin  calculus  and  Stochastic  PDEs,”  Columbia  University,  Min¬ 
erva  Foundation  Lectures,  December  2010. 

19.  B.L.  Rozovsky,  “Stochastic  Fluid  Dynamics,”  NSF  Institute  for  Pure  and  Applied  Mathematics,  In¬ 
vited  lecture,  January  2011. 

20.  B.L.  Rozovsky,  “Stochastic  Fluids  and  Malliavin  Calculus,”  Conference  on  Malliavin  Calculus  and 
Stochastic  Analysis,  University  of  Kansas,  Invited  talk,  March  2011. 

21.  B.L.  Rozovsky,  “On  Unbiased  Stochastic  Navier-Stokes  Equation,”  Workshop  on  SPDEs,  Archimedes 
Center  for  Modeling,  Analysis,  and  Computations ,  Heraklion,  Greece,  Invited  lecture,  June  2011. 

22.  B.L.  Rozovsky,  “Recent  Advances  in  Nonlinear-  Filtering,”  Imperial  College,  London.  Invited  lecture, 
June  2011. 

23.  B.L.  Rozovsky,  “Stochastic  Fluid  Dynamics  and  Malliavin  Calculus,”  Oxford  University,  Invited  lec¬ 
ture,  2011. 

24.  B.L.  Rozovsky,  “Uncertainty  Quantification  and  Nonlinear-  Filtering,”  ICIAM  2011,  Vancouver,  Canada, 
2011. 

25.  B.L.  Rozovsky,  “On  Unbiased  Stochastic  Navier-Stokes  Equation,”  ICIAM  2011,  Vancouver,  Canada, 
2011. 

26.  B.L.  Rozovskii,  Invited  Talk,  SIAM  conference  on  Computational  Science  and  Engineering,  Maiami, 
2009. 

27.  B.L.  Rozovskii,  Invited  Talk,  7th  ISAAC  Congress,  London,  2009. 
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28.  B.L.  Rozovskii,  Invited  Talk,  International  Conference  on  Spectral  and  High  Order  Methods,  2009, 
Trondheim,  Norway. 

29.  Alethea  Barbara,  Agent-based  Complex  Systems  Workshop  at  IPAM:  Organized  and  spoke  October 
14,  2009  1  hour.  Title:  ’’Agent-based  modeling  for  animal  migration  and  gang  behavior”. 

30.  Alethea  Barbara,  American  Soc.  Criminology  Meeting  Philadelphia,  joint  presentation  with  Shannon 
Reid,  Nov.  6,  2009,  ’’Agent-based  simulations:  modeling  gang  violence  in  Hollenbeck” 

31.  Alethea  Barbara,  UCSB  Hypatian  Seminal-,  Nov  30,  2010,  ’’Agent-based  modeling  of  complex  sys¬ 
tems,  and  how  to  claim  your  mathematical  territory  after  your  doctorate” 

32.  Alethea  Barbara,  IPAM’s  Optimal  Transport  Reunion  Workshop  at  Lake  Arrowhead  (invited  talk) 
December  10,  2009,  20  minutes,  Title:  ”On  limits  of  a  discrete  time  interacting  particle  system” 

33.  Alethea  Barbara,  2nd  Annual  Southern  California  Women  in  Math  Symposium  February  20,  2010, 
30  minutes.  Title:  ’’Agent-based  models  of  social  dynamics”. 

34.  Alethea  Barbara,  2010  Mathematics  Festival  at  UCLA  (2  sessions)  February  13,  2010,  Two  sessions, 
each  50  minutes  Title:  ’’Modeling  the  Real  World:  Using  Math  to  Study  Migration,  Territoriality,  and 
Social  Networks”. 

35.  Alethea  Barbara,  Invited  Talk  at  USC  for  the  Women  in  Math  Seminal-,  March  12,  2010,  1  hour.  Title: 
’’Simulating  social  dynamics  with  interacting  particle  models”. 

36.  Alethea  Barbara,  Invited  Seminal-  Talk  at  Redlands,  March  31,  2010,  1  hour.  Title:  ’’Simulating  Social 
Dynamics  with  Interacting  Particle  Models”. 

37.  Alethea  Barbara,  Talk  at  SIAM’S  DSPDEs  conference  in  Barcelona,  Spain  (Organized  Mini-symposium 
and  spoke).  Mini-symposium  title:  Particle  and  mean  field  models  for  flocking  and  swarming,  co¬ 
chair  Massimo  Fornasier,  June  1,  2010,  30  minutes,  Talk  Title:  ’’Interacting  particle  models  for  Social 
Dynamics”. 

38.  Alethea  Barbara,  Workshop:  Modeling  Complex  Dynamics  in  Biological  Systems,  Universite  Paul 
Sabatier,  Toulouse,  France  (invited  talk)  June  9,  2010  1  hour  Title:  ’’Interacting  particle  models  for 
animal  social  dynamics”. 

39.  Alethea  Barbara,  Workshop:  Mathematics  of  Complex  Systems,  Universite  Paul  Sabatier,  Toulouse, 
France  (invited  talk)  June  10,  2010,  45  minutes,  Title:  ’’Agent-based  models  for  gang  dynamics”. 

40.  Alethea  Barbara,  Kinetic  and  Mean-field  models  in  the  Socio-Economic  Sciences:  workshop  at 
ICMS,  Edinburgh,  Scotland  (25  minute  invited  talk),  July  31,  2009,  Title:  ’’Fish  migration,  interacting 
particles,  and  scaling  laws”. 

41.  Andrea  Bertozzi,  Women  in  Mathematics  Seminal-,  Univ.  of  Wisconsin,  Madison,  WI,  October  7, 
2009. 

42.  Andrea  Bertozzi,  Invited  talk  and  co-organizer,  IPAM  workshop  on  ’’Agent  Based  Complex  Systems”, 
October  14,  2009. 

43.  Andrea  Bertozzi,  Invited  address,  Southern  California-Nevada  MAA  Section  Meeting,  October  17, 
2009. 

44.  Andrea  Bertozzi,  Invited  talk.  Workshop  on  Self-Organization  and  Multi-Scale  Mathematical  Model¬ 
ing  of  Active  Biological  Systems,  Statistical  and  Applied  Mathematical  Sciences  Institute,  Durham, 
NC,  October  27  2009. 

45.  Andrea  Bertozzi,  Invited  talk.  Army  Research  Office,  Durham,  NC  October  28,  2009. 

46.  Andrea  Bertozzi,  Invited  talk  UBC  Vancouver,  PIMS  mini-symposium  in  PDE,  one  hour  talk,  Novem¬ 
ber  13,  2009. 

47.  Andrea  Bertozzi,  Invited  talk  on  “A  Variational  Approach  to  Hyperspectral  Image  Fusion”,  Minisym¬ 
posium  on  Variational  Methods  in  Image  Processing  and  Interface  Problems,  Maria  Westdickenberg 
and  Sung  Ha  Kang,  Organizers,  SIAM  Conference  on  Analysis  of  Partial  Differential  Equations,  Mi¬ 
ami,  December  7,  2009. 
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48.  Andrea  Bertozzi,  Invited  talk  on  ’’Mathematical  Models  for  Urban  Crime”  Minisymposium  on  “Non¬ 
linear  Stochastic  PDEs  and  Applicationss  to  Complex  Systems”,  Hakima  Bessaih  and  Bjorn  Birnir, 
Organizers,  SIAM  Conference  on  Analaysis  of  Partial  Differential  Equations,  Miami,  December  8, 

2009. 

49.  Andrea  Bertozzi,  Invited  talk  in  SIAM  Minisymposium  on  New  Trends  in  Mathematical  Methods  in 
Imaging  Science,  Rick  Chartrand,  Stacey  Levine,  Jennifer  Mueller,  and  Luminita  Vese  organizers. 

Joint  Math  Meetings,  San  Francisco,  Sat  Jan  16,  2010 

50.  Andrea  Bertozzi,  Invited  talk,  China  Lake  Distinguished  Speakers  Colloquium  Series,  China  Lake 
Naval  Air  Warfare  Center,  Ridgecrest,  CA,  Feb  2,  2010 

51.  Andrea  Bertozzi,  Invited  talk  Rand  Corp.  Santa  Monica,  Feb  1 1,  2010 

52.  Andrea  Bertozzi,  Invited  talk.  Session  on  ’’Traffic,  Crowds  and  Society”,  AAAS  Annual  Meeting,  San 
Diego,  February  20,  2010 

53.  Andrea  Bertozzi,  Invited  talk.  Imperial  College  London,  Institute  for  Mathematical  Sciences,  invited 
talk  in  three  part  session  on  Geometric  Mechanics,  Darryl  Holm  host,  March  8,  2010 

54.  Andrea  Bertozz.i  Fluid  Mechanics  Seminal-,  DAMTP,  Univ.  of  Cambridge,  UK,  March  5,  2010 

55.  Andrea  Bertozzi,  Brown  University,  Mathematics  Department,  Distinguished  Lecture  Series,  three 
one  hour  lectures,  March  1 1-12,  2010 

56.  Andrea  Bertozzi,  Brown  University,  Mathematics  Department,  faculty  speaker,  Symposium  for  Un¬ 
dergraduates  in  the  Mathematical  Sciences,  45  minute  talk,  March  13,  2010 

57.  Andrea  Bertozzi,  Invited  talk.  Minisymposium  on  Advanced  Frameworks  for  Restructuring  High  Di¬ 
mensional  Datasets,  SIAM  Conf.  on  Imaging  Science,  Chicago,  IL  April  13,  2010,  Edward  H.  Bosch 
Organizer. 

58.  Andrea  Bertozzi.Invited  talk.  Plenary  talk.  Joint  SIAM/RSME-SCM-SEMA  Meeting  on  Emerging 
Topics  in  Dynamical  Systems  and  Partial  Differential  Equations  DSPDEs’  10  June  1,  2010,  Barcelona, 
Spain 

59.  Andrea  Bertozzi,  Invited  talk  2010  DTRA/NSF  Algorithm  workshop,  talk  on  “Undergraduate  Re¬ 
search  Training  in  Defense  Applications”,  June  22,  2010,  Chapel  Hill,  NC 

60.  Andrea  Bertozzi,  Invited  talk  2010  DTRA/NSF  Algorithm  workshop,  talk  on  “Imaging  of  multispec- 
tral  and  hyperspectral  data”,  June  23,  2010,  Chapel  Hill,  NC 

61.  Andrea  Bertozzi,  invited  talk  talk  in  workshop  Fluid  Dynamics  Analysis  and  Numerics,  a  conference 
in  honor  of  Tom  Beale’s  60th  Birthday,  Duke  Univ.,  Durham,  NC  June  28,  2010 

62.  Andrea  Bertozzi,  invited  talk  at  Park  City  Mathematics  Institute,  Program  on  Imaging  Sciences,  Park 
City  UT,  July  5,  2010 

63.  Andrea  Bertozzi,  Graduate  School  of  Engineering  and  Applied  Sciences,  Distinguished  Lecture, 
Naval  Postgraduate  School,  Sept.  2,  2010. 

64.  Andrea  Bertozzi,  London  Taught  Course  Centre  8  hour  intensive  course  on  Mathematics  of  Crime, 

Univ.  College  London,  Sept.  9-10,  2010. 

65.  Andrea  Bertozzi,  Department  of  Applied  Mathematics  and  Statistics  Johns  Hopkins  University,  Col¬ 
loquium  Sept.  16,  2010. 

66.  Andrea  Bertozzi,  Allman  Family  Public  Lecture,  Southern  Methodist  University,  Mathematics  in  the 
Real  World,  Sept.  23,  2010. 

67.  Andrea  Bertozzi,  Invited  talk,  IPAM  workshop  on  Machine  Reasoning:  Mission  Focused  Actions/Reactions 
Based  on  System  Integration  of  Information  Derived  from  Complex  Real-World  Data,  Oct  19,  2010. 

68.  Andrea  Bertozzi,  Distinguished  Lecture,  Department  of  Mathematics,  Simon  Fraser  Univ.,  Oct.  29, 

2010. 

69.  Andrea  Bertozzi,  Invited  talk,  9th  Annual  Image  Fusion  Workshop,  Institute  for  Defense  and  Govern¬ 
ment  Advancement,  Tyson’s  Corner,  VA,  November  16,  2010. 

70.  Andrea  Bertozzi,  Invited  talk,  RCIM  Symposium  Mathematical  Aspects  of  Image  Processing  and 
Computer  Vision  2010  Sapporo,  Japan,  November  26,  2010. 
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71.  Andrea  Bertozzi,  Invited  talk,  NSF  workshop  on  New  Directions  in  Dynamical  Systems  Inspired  by 
Biological,  Energy,  Environmental,  and  Information  Sciences,  Atlanta,  GA,  Jan  4,  2011 

72.  Andrea  Bertozzi,  Invited  talk.  Dynamics  Days,  Chapel  Hill,  NC,  Jan  5,  201 1 

73.  Andrea  Bertozzi,  AMS  Invited  Address,  Joint  Mathematics  Meetings,  New  Orleans,  LA,  Jan  7,  201 1 

74.  Andrea  Bertozzi,  Invited  Talk  (one  hour),  2011  annual  meeting  of  the  Australian  and  New  Zealand 
Industrial  and  Applied  Mathematics  division  of  the  Australian  Mathemtaical  Society.  ANZIAM  2011 
in  Glenelg,  Australia,  Feb.  1,  2011 

75.  Andrea  Bertozzi,  PIMS  Applied  Mathematics  Seminar,  University  of  Saskatchewan,  Saskatoon,  March 
14,2011 

76.  Andrea  Bertozzi,  Seminal-,  Ecole  Normal  Superieur  de  Cachan,  Centre  de  Mathematiques  et  de  leurs 
Applications,  March  17,  2011 

77.  Andrea  Bertozzi,  Groupe  de  travail  -  Mathematiques  de  la  decision.  Seminar,  Univ.  of  Toulouse, 
March  24,2011 

78.  Andrea  Bertozzi,  Colloquium  de  L’lnstitut  de  Mathematiques  de  Toulouse,  March  25,  201 1 

79.  Andrea  Bertozzi,  Mathematics  Colloquium  Univ.  of  Warwick,  UK,  June  3,  2011,  ’’Mathematics  of 
Crime” 

80.  Andrea  Bertozzi,  Nonlinear  Diffusion:  Applications,  Analysis  and  Computation  conference  to  cele¬ 
brate  the  60th  Birthday  of  Charlie  Elliot,  Univ.  Warwick,  June  6-8,  2011,  invited  45  minute  talk. 

8 1 .  Andrea  Bertozzi,  7th  East  Asian  SIAM  meeting  Waseda  University  Kitakyushu  Campus,  Japan  Keynote 
Talk,  June  29,2011 

82.  Andrea  Bertozzi,  Invited  talk.  Minisymposium  on  Modern  Methods  and  Applications  of  the  Calculus 
of  Variations:  Image  Processing  ,  July  20,  2011,  International  Congress  on  Industrial  and  Applied 
Mathematics,  Vancouver  BC 

83.  Andrea  Bertozzi,  Invited  talk,  Duke  Workshop  on  Sensing  and  Analysis  of  High-Dimensional  Data 
(SAHD),  July  26,  2011 

84.  Andrea  Bertozzi,  Plenary  talk,  AWM  40th  Anniversary  Conference,  ICERM,  Brown  University, 
September  18,  2011 

85.  Andrea  Bertozzi,  Applied  Mathematics  Seminar,  Mathematics  of  Crime,  Harvard  University,  Septem¬ 
ber  19, 2011 

86.  Andrea  Bertozzi,  Widely  Applied  Mathematics  Seminal-,  Swarming  by  Nature  and  by  Design,  Harvard 
University,  September  20,  2011 

87.  P.  Jeffrey  Brantingham,  Repeats  and  Reprisals:  The  Dynamics  of  Burglary  and  Rival  Gang  Violence 
in  Los  Angeles.  Invited  lecture  at  the  Workshop  on  Modeling  and  Analysis  of  Security,  January  4  7, 
2010,  University  of  Chile,  2010. 

88.  P.  Jeffrey  Brantingham,  “Agent-based  and  continuum  models  of  crime  pattern  formation,”  Invited  lec¬ 
ture  presented  at  the  Agent-based  Complex  Systems  workshop.  Institute  of  Pure  and  Applied  Mathe¬ 
matics,  UCLA,  October  12-14,  2009. 

89.  P.  Jeffrey  Brantingham,  “Why  seeking  to  reduce  gang  rivalries  might  increase  gang  violence,”  UC 
Irvine  Criminology,  Law  and  Society,  April  13,  2011. 

90.  P.  Jeffrey  Brantingham,  “The  Mathematical  Ecology  of  Criminal  Street  Gangs,”  UCLA  Marschak 
Colloquium,  April  8,  2011. 

91.  P.  Jeffrey  Brantingham,  Stochastic  Models  of  Crime  with  Practical  Implications  for  Policing.  Work¬ 
shop  on  Geospatial  Abduction  Problems,  University  of  Maryland,  March  3-4,  2011. 

92.  P.  Jeffrey  Brantingham,  “University-Agency  Collaboration  in  Predictive  Policing,”  10th  Anniversary 
Celebration  of  the  Institute  for  Canadian  Urban  Research,  Simon  Fraser  University,  February  3,  201 1. 

93.  Maria  D’Orsogna,  invited  talk.  Kinetic  and  mean-field  models  in  the  socio-economic  sciences,  Edin¬ 
burgh,  Scotland,  July  2009 
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94.  Erik  Lewis,  poster  presentation  at  the  Agent-Based  Complex  Systems  workshop  at  IPAM,  October 
12-14,  2009.  “Comparing  Gang  Rivalries  and  Civilian  Deaths  in  Iraq  Using  Self-Exciting  Point  Pro¬ 
cesses.” 

95.  George  Mohler,  “Agent-based,  Bayesian  Geographic  Profiling”  Workshop  on  Analysis  and  Modeling 
of  Security,”  Jan  4-7,  2010,  Santiago  Chile,  40  minute  talk. 

96.  George  Mohler,  “Crime  as  a  Self-Exciting  Point  Process:  An  innovative  approach  in  crime  predic¬ 
tion,”  American  Society  of  Criminology  annual  meeting,  Nov  4-7,  2009,  20  minute  talk. 

97.  Todd  Wittman,  Contributed  talk.  25  minutes.  “Image  Processing  in  the  UCLA  REU  Program.”  Inter¬ 
national  Conference  on  Technology  in  Collegiate  Mathematics.  Chicago,  IL.  March  2010. 

98.  Todd  Wittman,  Contributed  talk.  20  minutes.  ’’Problems  in  Geospatial  Image  Processing.”  Center  for 
Nonlinear  Analysis  Summer  School  on  Image  Processing  and  PDEs.  Pittsburgh,  PA.  June  2010. 

99.  Todd  Wittman,  “The  UCLA  Math  REU  Program:  Getting  Students  Involved  in  Research.”  University 
of  Southern  California,  Department  of  Mathematics  Colloquium.  Los  Angeles,  CA.  April  2010. 

100.  Todd  Wittman,  Contributed  talk.  25  minutes.  “Variational  Methods  in  Hyperspectral  Image  Process¬ 
ing.”  SIAM  Conference  on  Analysis  of  Partial  Differential  Equations.  Miami,  FL.  December  2009. 

101.  Y.  S.  Cho,  G.  Ver  Steeg,  and  A.  Galstyan  “Co-Evolving  Mixed-Membership  Blockmodels”,  NIPS 
Workshop  on  Networks  Across  Disciplines,  2010. 

102.  A.  Allahverdyan,  A.  Galstyan,  and  G.V.  Steeg,  “Clustering  with  Prior  Information,”  NIPS  Workshop: 
Clustering:  Science  or  Art?  Towards  Principled  Approaches,  2009. 

103.  A.  Galstyan,  “Modeling  Covert  Activities  with  Hidden  Markov  Processes,”  SIAM  CADS  Mini-symposium 
on  Terrorism  Modeled  as  a  Dynamical  System,  Snowbird,  Utah,  2009  {invited). 

104.  A.  Galstyan  and  P.R.  Cohen,  “  Comparing  Diffusion  Models  for  Graph-Based  Semi-Supervised  Learn¬ 
ing,”  6th  International  Workshop  on  Mining  and  Learning  with  Graphs  (MLG-08),  Helsinki,  Finland, 
2008. 

105.  A.  Galstyan  and  P.R.  Cohen,  “Influence  Propagation  in  Modular-  Networks,”  AAAI  Symposium  on 
Social  Information  Processing  (SIP-08),  Stanford,  CA,  2008. 

106.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Probabilistic  Tracking  of  Plans  and  Intentions  in  Intelligence 
Analysis”,  talk  presented  at  the  WNAR/IMS  Annual  Meeting,  UC  Irvine,  June  2007. 

107.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Detecting  and  Tracking  Hostile  Plans  in  the  Hats  World,” 
AAAI  Workshop  on  Plan.  Activity  and  Intent  Recognition  (PAIR-07),  Vancouver,  Canada,  2007. 

108.  A.  Galstyan,  S.  Mitra,  and  P.R.  Cohen,  “Probabilistic  Plan  Tracking  and  Detection  for  Intelligence 
Analysis,”  poster  presented  as  the  Joint  Statistical  Meetings  (JSM),  Salt  Lake  City,  July  2007. 

109.  G.  Medioni,  Keynote  lecture.  Workshop  on  Perceptual  Organization,  San  Francisco,  CA,  June  13, 
2010. 

110.  G.  Medioni,  Keynote  lecture.  International  Workshop  on  Computer  Vision,  Shenzhen  Institute  of 
Advanced  Technology,  Chinese  Academy  of  Sciences,  Shenzhen,  China,  July  14,  2010. 

111.  G.  Medioni,  Invited  lecture,  “Recent  progress  in  object  tracking  (Multi  target  tracking,  tag  and  track, 
active  tracking,  tracking  in  flow)”,  INRIA,  Rocquencourt,  France,  October  2009. 

1 12.  G.  Medioni,  Keynote  speaker,  Los  Angeles/ Anaheim,  2009  World  Congress  on  Computer  Science 
and  Information  Engineering,  March  31,  2009. 

1 13.  G.  Medioni,  Keynote  speaker,  San  Diego  (Coronado),  Automated  Imaging,  February  4,  2009. 

1 14.  G.  Medioni,  Keynote  speaker,  ISVC,  Las  Vegas,  November  22,  2008. 

115.  G.  Medioni,  “Tensor  Voting  in  2  to  N  dimensions:  Fundamental  Elements,”  Distinguished  Lecture, 
Brown  University,  September  15,  2008. 

116.  V.V.  Veeravalli,  “Sensor  Control  for  Information  Collection  and  Fusion.”  International  Workshop  on 
Information  Fusion,  Xi’an,  China,  August  2011  (Plenary  Lecture). 

117.  V.V.  Veeravalli  and  T.  Banerjee,  “Quickest  Change  Detection  with  On-Off  Observation  Control.”  In¬ 
ternational  Workshop  in  Sequential  Methodologies,  Palo  Alto,  CA,  June  2011  (Invited). 
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1 18.  V.V.  Veeravalli  and  J.  Fuemmeler,  “Energy-Efficient  Multi-Target  Tracking  Using  Sensor  Networks.” 
Army  Conference  on  Applied  Statistics  (ACAS),  Lexington,  VA,  October  2008  (Invited). 

1 19.  V.V.  Veeravalli,  “System-Theoretic  Foundations  for  Sensor  Networks.”  IEEE  Communication  Theory 
Workshop ,  Sedona,  AZ,  May  2007  (Keynote  Lecture). 

120.  V.V.  Veeravalli,  “System-Theoretic  Foundations  for  Sensor  Networks.”  IWWAN ,  New  York,  NY,  June 
2006  (Keynote  Lecture). 

121.  V.V.  Veeravalli,  “Smart  Sleeping  Policies  for  Wireless  Sensor  Networks.”  NSF  Workshop  on  Future 
Directions  in  Networked  Sensing,  Boston,  MA,  May  2006  (Invited). 

(2)  Demographic  Data  for  this  Reporting  Period: 

(a)  Number  of  Manuscripts  submitted  during  this  reporting  period:  148 

(b)  Number  of  Peer  Reviewed  Papers  published  during  this  reporting  period:  136 

(c)  Number  of  books:  3 

(d)  Number  of  Non-Peer  Reviewed  Papers  submitted  during  this  reporting  period:  0 

(e)  Number  of  Presented  but  not  Published  Papers  submitted  during  this  reporting  period:  121 

(3)  Demographic  Data  for  the  life  of  this  agreement: 

(a)  Number  of  Scientists  Supported  by  this  agreement:  75 

(b)  Number  of  Inventions  resulting  from  this  agreement:  4 

(c)  Number  of  PhD(s)  awarded  as  a  result  of  this  agreement:  7 

(d)  Number  of  Bachelor  Degrees  awarded  as  a  result  of  this  agreement:  10 

(e)  Number  of  Patents  Submitted  as  a  result  of  this  agreement:  4 

(f)  Number  of  Patents  Awarded  as  a  result  of  this  agreement:  none 

(g)  Number  of  Grad  Students  supported  by  this  agreement:  28 

(h)  Number  of  FTE  Grad  Students  supported  by  this  agreement:  12 

(i)  Number  of  Post  Doctorates  supported  by  this  agreement:  13 

(j)  Number  of  FTE  Post  Doctorates  supported  by  this  agreement:  8 

(k)  Number  of  Faculty  supported  by  this  agreement:  12 

(l)  Number  of  Other  Staff  supported  by  this  agreement:  none 

(m)  Number  of  Undergrads  supported  by  this  agreement:  15 

(n)  Number  of  Master  Degrees  awarded  as  a  result  of  this  agreement:  4 

(4)  Student  Metrics  for  graduating  undergraduates  funded  by  this  agreement: 

(a)  Number  of  undergraduates  funded  by  your  agreement  during  this  reporting  period:  15 

(b)  Number  of  undergraduate  funded  by  your  agreement,  who  graduated  during  this  period:  12 

(c)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period  with  a 
degree  in  a  science,  mathematics,  engineering,  or  technology  field:  12 

(d)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period  and  will 
continue  to  pursue  a  graduate  or  PhD  degree  in  a  science,  mathematics,  engineering,  or  technology  field:  10 

(e)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period  and  intend 
to  work  for  the  Defense  Department:  none 

(f)  Number  of  undergraduates  graduating  during  this  period,  who  achieved  at  least  a  3.5  GPA  based  on 
a  scale  with  a  maximum  of  a  4.0  GPA.  (Convert  GPAs  on  any  other  scale  to  be  an  equivalent  value  on  a  4.0 
scale.):  12 

(g)  Number  of  undergraduates  working  on  your  agreement,  who  graduated  during  this  period  and  were 
funded  by  a  DOD  funded  Center  of  Excellence  for  Education,  Research  or  Engineering:  none 

(h)  Number  of  undergraduates  funded  by  your  agreement,  who  graduated  during  this  period  and  will 
receive  a  scholarship  or  fellowship  for  further  studies  in  a  science,  mathematics,  engineering  or  technology 
field:  none 

(5)  Report  of  inventions 

Patent  applications: 
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1.  G.  Medioni  and  Q.  Yu,  USC  File  No:4048  “Spatio-Temporal  Multiple  Target  Tracking  Using  Markov 
Chain  Monte  Carlo  Data  Association”  . 

2.  G.  Medioni  and  Q.  Yu,  USC  File  No:4109  “Online  Tracking  Using  Co-trained  Generative  and  Dis¬ 
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Chapter  2 

Efficient  Spatiotemporal  Nonlinear 
Filtering  Methods  for  Recognition  and 
Tracking  of  Patterns  and  Trends 


The  work  presented  in  this  chapter  has  been  performed  by  Dr.  Rozovsky’s  group  in  Brown  University  in 
collaboration  with  Dr.  Lototsky  (USC),  Dr.  Mikulevicius  (USC),  Dr.  Tartakovsky  (USC),  and  Dr.  Wan 
(Princeton). 

1.  Introduction 

Until  recently  applications  of  nonlinear  filtering  (NLF)  for  hidden  Markov  models  (HMM)  have  focused 
mostly  on  state  processes  of  small  to  medium  complexity  insufficient  for  many  applications  of  interest  for 
DOD  and  the  U.S.  Army,  in  particular.  One  of  the  objectives  of  the  research  supported  by  this  Grant  is 
developing  of  spatiotemporal  NLF  algorithms  and  their  DOD  relevant  applications. 

The  main  scientific  barrier  in  spatiotemporal  filtering  is  enormous  computational  complexity.  The  funda¬ 
mental  challenge  of  our  research  is  in  finding  effective  models  of  spatiotemporal  systems  and  the  algorithms 
that  would  allow  real  time  processing  of  these  models. 

Two  different  types  of  spatial  structure  appeal's  in  applications  of  interest  for  this  research:  video-streams 
(in  video  surveillance)  and  graphs  (in  information  assurance  (IA)).  Some  progress  has  already  been  made 
in  application  of  NLF  to  analysis  of  video  streams.  However,  the  IA  applications  of  NLF  are  quite  novel. 
In  video  tracking,  the  spatial  components  are  images  characterized  by  shapes,  curves,  etc.  In  IA,  the  spatial 
components  are  graphs  depicting  relations  between  IP  addresses  and  port  numbers.  Other  variables,  such 
as  source  address,  source  port,  protocol,  etc.  also  could  be  included.  Source  address  and  port,  however,  are 
typically  spoofed  in  attacks  (this  corresponds  to  occlusions  and  distortions  in  the  image  analysis). 

One  of  the  main  objectives  of  this  project  is  development  of  Spatiotemporal  NLF  algorithms  and  their 
DOD  relevant  applications.  The  most  promising  developments  in  tracking  of  distributed  images  were  made 
in  [25].  There  were  interesting  developments  in  quantum  MCMC  for  tracking  very  large  systems  (see  [140] 
and  references  therein);  Zakai  and  Kushner  equations  for  comparatively  simple  linear  spatiotemporal  state 
processes  were  derived  in  [3],  Nevertheless,  the  field  is  yet  in  its  infancy. 

2.  Spatiotemporal  Nonlinear  Filtering  for  Recognition  and  Tracking  of  Patterns  and  Trends 
2.1.  A  Bayesian  Approach  to  Recognition  of  Patterns  and  Trends 

In  this  project,  we  have  farther  developed  a  probabilistic  Bayesian  framework  imbedded  into  dynamical 
distributed  systems,  more  specifically,  a  nonlinear  filtering  approach  for  the  analysis  of  heterogeneous  spa¬ 
tiotemporal  data.  A  distinctive  advantage  of  NLF  formalism  is  that  it  generates  optimal  tracking  strategies. 
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The  NLF  approach  turned  out  to  be  very  instrumental  in  tracking  point  targets  with  highly  nonlinear  dynam¬ 
ics  and  observation  structure.  The  extension  of  NLF  to  a  spatiotemporal  system  attempted  in  the  present 
research  provides  an  algorithmic  support  for  developing  of  multi-target  multi-sensor  trackers.  Such  trackers 
will  be  capable  of  dealing  with  a  large  number  of  extended  targets  with  unknown  shapes  that  arc  monitored 
by  a  network  of  stationary  or  moving  sensors  in  complicated  scenarios  characteristic  of  urban  warfare  and 
counter-terrorism . 

The  analysis  and  synthesis  of  high-volume  data  and,  in  particular,  spatiotemporal  data  is  addressed 
by  different  disciplines  and  from  different  perspectives.  Our  approach  is  probabilistic  in  nature:  more 
specifically,  it  is  Bayesian.  One  important  feature  of  the  Bayesian  approach  is  that  it  interprets  the  data  not 
as  a  self-contained  information  depository  but  in  light  of  already  available  knowledge  (e.g.,  human  expertise) 
regarding  the  events  reflected  by  the  data.  This  feature  is  an  ideal  instrument  for  keeping  human  operators 
“in  the  loop”  in  the  process  of  automated  decision  making. 

Nonlinear  filtering  is  an  extension  of  the  Bayesian  framework  into  dynamical  systems.  Kalman  filter, 
designed  for  linear  dynamical  systems  and  linearly  structured  observations,  is  probably  the  most  famous 
Bayesian  filter.  Generalizations  of  Kalman  filter  to  nonlinear  systems  and/or  observations  arc  usually  re¬ 
ferred  to  as  nonlinear  filtering.  NLF  is  a  field  on  the  cutting  edge  of  contemporary  stochastic  analysis, 
information  theory,  and  statistical  inference.  This  is  an  emerging  methodology  with  an  enormous  breadth  of 
applications. 

To  be  more  specific,  one  could  consider  the  following  simple  model,  with  two  sequences  (xt)t>0  and 
(yt)t>o  >  called  state  and  observations,  respectively.  Often,  the  state  is  modeled  as  a  Markov  chain  with  the 
transition  probability  kernel  Qt  (x,  y)  and  the  initial  distribution  ttq.  The  observation  sequence  is  related  to 
the  state  by 

y t  =  Ht  (xt)  +  vt, 

where  vt  is  noise  (not  necessarily  Gaussian).  The  a  priori  information  contained  in  this  model  consists  of 
7To,  Qt,  and  gt,  where  gt  (v)  is  the  PDF  of  noise  vt.  This  model  is  often  referred  to  as  a  hidden  Markov  model 
(HMM)  because  the  Markov  chain  xt  is  hidden  from  the  observer  by  a  possibly  nonlinear  transformation 
Ht  and  noise  vt.  The  goal  of  nonlinear  filtering  is  to  compute  at  each  time  t  the  posterior  distribution  7rt|t  of 
the  state  xj  given  the  realization  of  the  observation  y0|t  =  (yo, . . . ,  y t). 

In  the  discrete  time  setting,  the  algorithm  of  the  nonlinear  filtering  is  very  simple.  It  consists  of  two 
steps: 

f  prediction  :  ir^  (x)  =  f  Q%  (a/,  x)  n~i\t-i  W) dx'  ~  ,  , 

\  correction  :  7Tt|t  (x)  =  1 !t  (x)  irt\t-i  (x)  /  f  ^t  (x)  7Tt|t_i  (x)  dx, 

where  'Iq  (x)  =  gt  (y/  —  // (j;))  is  the  likelihood  function.  The  second  step  is  simply  the  Bayes  formula. 

The  posterior  distribution  is  the  centerpiece  of  the  Bayesian  estimation.  Indeed,  the  nonlinear  filter 
x.t  =  f  xt Tt\t  (x)  dx  is  an  optimal  (in  the  mean-square)  estimator  for  the  state  process  xt.  Also,  the  posterior 
distribution  is  extremely  valuable  for  the  “visualization”  of  pattern  changes  (see  Figure  2.1). 

Figure  2. 1  shows  propagation  of  the  posterior  distribution  of  the  position  of  an  acutely  maneuvering 
small  and  low  SNR  target  (the  signal-to-noise  ratio  (SNR)  is  —  7dB).  The  spatio-temporal  (S-T)  input  is  IR 
video-stream.  The  brighter  colors  correspond  to  the  higher  values  of  the  posterior  distribution,  and  the  green 
line  is  the  true  trajectory. 

In  the  continuous  time  case,  the  prediction  and  correction  steps  merge  and  the  posterior  distribution 

TTt(x)  =  ttt\t  (x)  is  given  by  trt  (x)  =  fa  (x)  I  J  fa  (x')  dx'  and  fa  (x)  is  a  solution  of  the  stochastic 
differential  equation 

fa  ( X )  =  A*  ( t ,  x)  fa  (x)  +  Ht(x)vt,  (j) o  (x)  =  7 r0  (x) ,  (2.2) 

where  A  is  the  generator  of  the  transition  probability  kernel  Qt  (x,  y )  and  \q  is  white  noise.  For  example, 
if  the  state  process  is  given  by  the  noisy  kinematic  equation  xt  =  a  it,  xt)  +  exit,  where  it  is  white  noise 
independent  of  vt, 

2 

fa  (x)  =  y  0"  (x)  -  (a  (t,  x)  fa  (x))'  +  Ht{x)fa  (x)  vt.  (2.3) 
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Note  that  equation  (2.3)  is  a  stochastic  partial  differential  equation  and  should  be  understood  in  the  sense  of 
Ito  calculus.  Equation  (2.2)  is  usually  referred  to  as  the  Duncan-Mortensen-Zakai  equation  and  its  solution 
4>t  ( x )  as  the  unnormalized  filtering  density  (UFD).  The  posterior  density  tt/  (x)  solves  a  Kushner  equation 
that  is  similar  to  the  Zakai  but  slightly  more  complicated.  The  theory  of  nonlinear  filtering  was  initiated  in 
the  1960s  (see,  e.g.,  [20,  51,  79,  80,  160,  184,  197],  etc.)  More  contemporary  reviews  of  NLF  methodology 
could  be  found  in  [46,  54,  72,  1 10,  141]).  A  comprehensive  review  of  the  state  of  the  art  of  NFF  is  presented 
in  Crisan  and  Rozovskii  [39]. 

The  simplicity  of  the  nonlinear  filtering  algorithm  is  deceptive.  While  being  extremely  effective  and 
stable,  this  algorithm  is  computationally  expensive  due  to  its  polynomial  complexity.  The  most  taxing  part  of 
the  algorithm  is,  of  course,  computing  the  integrals  which  has  to  be  repeated  on  line  every  time  when  a  new 
observation  arrives.  Direct  quadrature  methods  are  effective  only  when  the  dimension  of  the  state  process 
is  not  larger  than  3.  To  overcome  this  complication,  a  new  powerful  methodology  of  Particle  Filtering  was 
introduced  and  developed  in  the  1980s  -1990s.  The  idea  of  this  approach  was  to  replace  quadratures  in 
(2. 1)  by  Monte  Carlo  averaging.  At  time  t ,  the  averaging  is  performed  with  respect  to  identically  distributed 
random  particles 


x 


(i) 

i— l|i— 1  ’ 


.  .  ,X 


(JV)  \ 

Z— 1|Z— 1 ) 


with  the  probability  distribution  7r^1|t_1  computed  on  the  previous  step  using  the  recursion  (2.1)  with 


34 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


quadratures  replaced  by  Monte  Carlo  averaging.  Generation  of  the  particles 


x(1) 

s—  l|s— 1’  '  s — 1 1  s — 1 J 


s=0,l,2,... 


is  called  the  resampling  procedure.  It  plays  the  central  role  in  the  success  of  the  method.  The  nonlinear 
filter  based  on  the  described  procedure  is  often  called  Markov  Chain  Monte  Carlo  (MCMC)  filter  or  particle 
filter.  Various  versions  of  the  optimal  nonlinear  filters  based  on  Monte  Carlo  resampling  were  developed  in 
the  1990s.  including  the  iterative  particle  filter  (IPF)  [109],  sampling/important  resampling  (SIR),  particles 
filter  [46],  and  a  branching  particles  filter  (BPF)  [38],  etc.  For  a  review,  see  [46,  1 10]. 

2.2.  Fundamental  Equations  of  Spatiotemporal  NLF  and  NLF  Algorithms  for  Complex  Hidden  Markov 
Models 

One  of  the  main  objectives  of  this  project  is  development  of  spatio-temporal  NLF  algorithms  and  their  DOD 
relevant  applications.  Certain  progress  in  this  direction  has  been  made  during  the  current  stage  of  the  grant. 

To  address  tracking  in  distributed  images  we  have  derived  a  preliminary  version  of  Zakai  and  Kushner 
equations  for  spatial-temporal  observation  process  with  continuous  and  discrete  observations.  In  particular, 
we  have  derived  the  analogs  of  Zakai  and  Kushner  equations  of  nonlinear  filtering  in  this  setting. 

Zakai  and  Kushner  equations  of  nonlinear  filtering  are  theoretically  sound  and  well  understood.  How¬ 
ever,  even  for  comparatively  simple  systems  they  arc  too  computationally  intensive.  To  bypass  this  problem 
we  have  developed  a  new  “smart  models”  of  the  state.  It  turned  out  that  stationary  and  evolution  systems 
driven  by  space-only  noise  provide  substantial  computational  advantages.  During  this  period,  we  have  in¬ 
vestigated  systems  of  this  type  described  by  Stochastic  PDEs  (see  Lototsky  and  Rozovskii  [97,  98,  99]). 

To  reduce  the  computational  complexity  we  have  also  developed  state  models  provided  by  telescoping 
Markov  processes.  The  simplest  example  of  this  type  of  processes  is  referred  to  as  interacting  multiple  mod¬ 
els  (IMM).  Our  recent  results  will  allow  to  extend  this  methodology  to  very  complicated  systems.  Telescopic 
Markov  processes  arc  branching  processes  of  special  kind.  They  could  model  enormously  complicated  sys¬ 
tems.  However,  processing  of  these  systems,  if  done  judicially,  could  be  performed  on  line.  The  key  to 
this  goal  is  to  do  estimation  (filtering)  and  hypothesis  testing  (evaluation  of  the  relevance  of  each  branch) 
simultaneously.  The  main  benefit  of  this  approach  is  in  early  truncation  of  “low  priority”  branches. 

Partial  testing  of  the  obtained  algorithm  was  done  on  the  problem  of  tracking  volatility  in  financial 
markets  which  present  a  great  challenge  by  their  complexity  [40,  41]. 

2.3.  Visual  Tracking  of  Dim  Extended  Targets 

Tracking  of  moving  and  deforming  multiple  objects  (e.g.,  battlefield  monitoring,  visual  surveillance  in  urban 
environment,  etc.)  has  always  been  a  topic  of  substantial  interest  for  military  and  intelligence  services.  The 
ability  to  track  extended  targets  in  image  sequences  is  a  fundamental  problem  with  many  applications  of 
ultimate  relevance  to  the  ARMY  and  DOD.  While  a  substantial  progress  in  this  field  has  been  made  in  the 
last  decade,  the  development  of  a  robust  framework  for  tracking  extended  targets  subject  to  obscuration, 
changes  in  illumination,  pose,  etc.  is  still  a  challenge.  We  have  developed  NLF-based  spatial-temporal 
technology  capable  of  tracking  small  targets  with  very  low  SNR  (up  to  —  7dB)  in  plain  images.  It  is  worth 
mentioning  that  this  technology  allows  for  handling  very  dim  moving  targets  with  evolving  appearance  in 
noisy  and  cluttered  environments.  Our  method  is  based  on  combination  of  nonlinear  filtering  for  interacting 
multiple  models  and  recently  developed  Bayesian  marginalization  technique.  Introduction  of  interacting 
multiple  models  allows  to  account  for  evolving  appearances  of  the  targets  due  to  maneuvering,  obscuration, 
etc.  The  marginalization  technique  dramatically  reduces  the  computational  complexity  of  the  algorithm. 

In  particular,  it  makes  possible  real  time  multitarget  tracking.  The  precision  of  the  NLF  algorithm  allows 
surveillance  with  low  SNR. 

Figure  2.2  illustrates  the  results  of  stabilization  and  target  tracking  with  the  NLF  algorithm  for  a  very 
difficult  UAV  scenario  that  includes  severe  translations  and  rotations. 

Further  results  of  implementation  of  this  approach  to  detection  and  tracking  of  low-observable  targets 
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Figure  2.2:  Video  tracking  for  UAV  data. 


in  video  and  IR  images  and  its  efficiency  compared  to  industry  standard  methods  will  be  demonstrated  in 
Chapter  13,  Section  6. 

3.  Nonlinear  Filtering  for  Tracking  and  Identification  of  Intentions 

One  of  the  central  directions  of  Dr.  Rozovsky’s  research  is  applications  of  Nonlinear  Filtering  to  tracking 
complex  distributed  systems.  In  the  course  of  this  project  this  research  was  extended  beyond  physical 
systems.  In  particular,  current  methodology  is  designed  to  track  and  identify  plans  and  intensions  rather 
than  position,  identity,  direction  of  motion,  etc.  The  mathematical  foundation  of  this  methodology  is  based 
on  spatial-temporal  processes  coupled  with  “telescopic”  Markov  processes.  This  presents  a  substantial 
extension  of  the  current  paradigm  of  Hidden  Markov  Models. 

Two  approaches  for  modeling  the  state  process  were  investigated:  direct  telescopic  Markov  chain  and  a 
combination  of  fast  mean-reverting  diffusion  model  parameterized  by  simple  telescopic  Markov  chain.  Low 
computational  simplicity  is  the  main  advantage  of  the  new  approach  based  on  the  mean-reverting  diffusion. 
Clearly,  low  computational  complexity  allows  us  to  deal  with  very  complex  multidimensional  systems  which 
is  extremely  important  for  the  needs  of  practical  video  surveillance. 

Figure  2.3  illustrates  the  results  of  motion  detection  and  tracking  in  a  sequence  of  images.  Despite  the 
fact  that  the  object  has  very  low  visibility  its  motion  is  Racked  very  reliably. 

4.  Nonlinear  Filtering  Methods  for  Tracking  Hidden  Attributes  and  Inference  of  Collective 
Behaviors 

4.1.  Tracking  Hidden  Attributes 

In  2010  A.  Papanicolaou  and  B.  Rozovsky  have  developed  a  new  parallelization  of  computations  algorithm 
and  applied  it  to  the  Pinocchio  example  (see  Figure  2.4)  The  resulting  speed-up  credited  to  the  parallel 
architecture  for  Pinocchio  was  quite  significant.  This  example  provided  us  with  a  sense  of  what  kind  of 
parallel  architecture  would  be  appropriate  for  general  nonlinear  filtering  algorithms  for  video  tracking.  It 
was  developed  into  reasonably  universal  algorithm  for  parallelization  of  algorithms  of  nonlinear  filtering  for 
spatial-temporal  data. 

Algorithms  for  tracking  complex  objects  require  high-resolution  image  data  as  input.  This  results  in 
a  large  set  of  computations  to  obtain  the  statistics  for  an  observed  image  given  every  possible  hypothesis. 
We  have  developed  a  parallel  architecture  to  distribute  the  tasks  associated  with  calculating  image  statistics 
required  to  temper  the  posterior  distribution.  In  particular,  for  the  particle  filter  that  has  been  used  in  our 
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Figure  2.3:  Motion  detection  and  tracking  in  images. 


Figure  2.4:  Tracking  Pinocchio  in  high  noise. 


algorithm,  computing  of  the  main  covariance 

(yk-H(x®)yR-\yk-H(x®)) 

is  hardly  possible  without  extensive  parallelization.  Standard  parallelization  algorithms  used  for  similar 
computations  are  based  on  the  so  called  master-slave  approach.  However,  the  master-slave  architecture  is 
effective  only  if  the  sample  size  N  is  relatively  small.  As  N  grows  the  workload  delegated  to  master  rank 
will  eventually  suipass  the  workload  of  each  slave  rank,  and  such  a  situation  is  not  a  good  use  of  computing 
resources  because  the  bulk  of  the  processing  power  will  idle  as  the  master  rank  works.  In  other  words, 
the  workload  is  not  well  balanced,  or  we  say  there  is  “workload  imbalance.”  For  large  N,  we  have  imple¬ 
mented  a  new  parallel  architecture  that  distributes  the  global  workload  equally  among  all  ranks,  but  does 
not  become  crippled  by  the  cost  of  communicating  particle  weights,  particular  when  importance  sampling 
is  invoked.  The  algorithm  also  takes  advantage  of  importance  sampling  and  bootstrap  sampling.  However, 
a  regular  bootstrap  will  defeat  the  purpose  of  locally  independent  particle  filters  because  it  creates  a  compli¬ 
cated  global  exchange  of  all  local  particle  data  among  all  of  the  ranks.  To  deal  with  this  problem  we  have 
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implemented  a  truncated  global  bootstrap  sampling  where  weights  below  a  certain  threshold  arc  truncated 
to  zero  when  importance  sampling  is  invoked. 

Figure  2.4  illustrates  the  result  of  tracking. 

The  related  results  were  partially  reported  in  the  paper  A.  Papanicolaou  and  B.  Rozovsky,  “Tracking 
Hidden  Attributes,”  Submitted  to  SIAM  J.  Imaging  Sciences,  2010. 

4.2,  Inference  of  Collective  Behaviors:  Intent  and  Target  Identification  in  a  Bird  Flock 

An  important  direction  of  our  research  is  tracking  intensions  of  very  large  groups  of  agents  that  exhibit 
both  group  and  individual  behavior.  Collective  behavior  is  manifested  in  a  variety  of  complex  systems 
ranging  from  bacterial  colonies,  bird  flocks,  groups  of  terrorists,  insect  swarms,  troops  on  the  march,  and 
even  to  pedestrians.  The  hallmark  of  collective  behavior  is  the  emergence  of  self-organization:  numerous 
but  simple  local  interactions  among  components  result  in  a  complex  global  behavioral  pattern.  Besides  its 
magnificence,  collective  behavior  has  been  shown  to  provide  benefits  to  the  members  of  a  group  in  many 
facets  of  their  lives  such  as  forage  efficiency  and  reproduction.  One  of  the  features  that  make  these  possible 
is  a  sudden  coherent  change  in  direction.  This  is  achieved  through  rapid  transfers  of  directional  information, 
which  arc  more  efficient  and  faster  than  direct  communications. 

Rapid  coherent  changes  arc  typically  induced  by  the  combination  of  external  and  internal  perturbations. 
Presence  of  predators  is  an  example  of  external  perturbation  while  intrinsic  noise  is  an  example  of  internal 
one.  The  prediction  and  filtering  in  real-time  of  these  changes  arc  critical  to  control  and  monitor  collective 
groups.  Quantities  related  to  sudden  changes  arc  usually  hidden  and  accumulates  slowly  over  a  long  period 
of  time.  After  a  certain  threshold,  however,  these  quantities  take  a  massive  effect  on  the  apparent  dynamics 
of  a  group  and  produce  rapid  collective  changes.  Our  objective  in  this  study  is  to  monitor  and  predict  hidden 
quantities  controlling  the  sudden  coherent  changes.  Furthermore,  we  seek  to  identify  the  source  of  these 
changes.  Specifically,  we  arc  interested  in  the  collective  landing  of  a  bird  flock,  which  usually  occurs  rather 
abruptly  from  horizontal  flight  over  a  food  source.  We  model  a  flock  of  birds  performing  foraging  flight 
over  a  field  with  sporadic  food  sources.  Each  bird  flies  with  others  in  harmony  and  has  individual  landing 
intent  that  evolves  over  time;  mainly  increases  due  to  hunger.  If  some  birds  identify  a  food  source,  their 
landing  intent  increases  rapidly  and  a  paid  of  them  stalls  landing.  Via  local  interaction  that  is  analogous 
to  the  exchange  of  social  information,  other  birds  begin  landing.  We  model  each  bird  as  a  self-propelled 
particle  that  assumes  a  bird  to  be  a  lifeless  particle  and  be  governed  by  Newton’s  laws.  In  this  framework, 
the  social  interactions  between  birds  can  be  modeled  as  Newtonian  forces  or  interaction  potentials.  The 
landing  intent  and  the  availability  of  a  food  source  arc  incorporated  into  the  model  as  internal  variables. 
Since  these  are  hidden,  an  appropriate  framework  for  this  inference  problem  is  the  Telescopic  Markov  chain 
(TMC)  where  the  hidden  variables  reside  higher  levels  in  a  system  hierarchy  and  evolve  according  to  their 
own  dynamics  conditioned  on  the  higher-level  variable.  Figure  2.5  shows  an  evolution  of  a  30-birds  flock. 
After  initial  transient  stage,  a  group  of  birds  maneuver  a  steady  flight  until  a  sudden  landing  phase.  Boxed 
parts  in  the  left  figure  (a)  arc  magnified  in  (b)  and  (c),  respectively.  The  arrows  represent  velocities  of  birds. 

The  inference  problem  of  TMC  is  similar  to  the  hidden  Markov  model  (HMM)  applied  to  vector-valued 
Markov  chain.  However,  due  to  its  iterative  structure,  TMC  dramatically  reduces  the  dimension  of  state 
space.  In  this  work,  the  observation  only  takes  the  location  of  each  bird  in  a  flock  at  discrete  time  steps. 
Since  the  lowest  level  of  TMC  in  our  model  can  be  cast  into  a  linear  dynamical  system,  the  proper  inference 
model  in  this  level  is  the  Kalman  filter.  Combining  TMC  with  the  Kalman  filter,  the  inference  problem  can 
be  formulated  in  the  framework  of  interacting  multiple  models. 

The  model  and  algorithm  developed  in  this  work  have  great  potential  for  other  fields  of  interest  such 
as  recognition  of  emotions  from  facial  expressions,  multi-target  tracking  and  element  recognition,  plan 
recognition  of  a  multi-agent  system,  and  cooperative  decision  making.  A  straightforward  application  would 
be  brought  up  in  the  context  of  anti-terrorism  on  the  extension  of  our  previous  work  of  Papanicolaou  and 
Rozovsky.  While  intent  inference  is  performed  for  one  agent  from  noisy  video  footage  in  our  previous  work, 
the  result  here  is  to  be  used  for  intent  inference  in  a  multi-agent  system  as  well  as  target  identification. 

Several  possible  improvements  are  left  for  future  research.  First,  we  need  to  develop  an  elaborate  model 
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Figure  2.5:  Evolution  of  a  bird  flock. 


for  agent  interactions,  which  properly  reflects  a  real-world  situation  of  information  exchange.  This  includes 
a  model  for  communication  mechanism  as  well  as  one  for  agent  relationships.  Since  the  relationship  between 
agents  could  be  cooperative  or  self-seeking,  game  theoretic  approaches  would  be  needed.  Second,  the  role  of 
noise  needs  to  be  investigated.  In  the  biological  context,  noise  is  found  to  be  beneficial  to  rapid  transitions. 
However,  little  has  been  known  for  the  social  context.  Last,  a  simple  Markovian  assumption  may  not 
sufficient  to  explain  complex  behaviors  and  needs  to  be  modified.  Collective  memory,  which  considers 
knowledge  of  the  current  state  as  well  as  the  past,  would  be  needed  for  complex  inference  problems.  The 
related  results  were  partially  reported  in  the  paper  by  Park,  Rozovsky,  and  Sowers  [120], 

5.  Uncertainty  Quantification  in  Covert  Networks 

In  collaboration  with  S.  Lototsky  (USC)  and  X.  Wan  (Princeton)  we  have  developed  a  novel  approach  to  un¬ 
certainty  quantification  (UQ)  for  large  stochastic  systems.  It  applies  to  various  complex  random  structures, 
in  particular  to  covert  networks.  Terrorist  networks  is  a  singularly  important  application  of  this  methodology. 

The  main  achievement  of  our  program  is  the  UQ  method  for  recovery/estimation  of  system’s  structure 
from  the  “camouflaged”  data  available  for  observations. 

Our  approach  to  UQ  in  large  stochastic  systems  is  based  on  powerful  technique  originated  in  Quantum 
Probability.  This  technique  is  often  referred  to  as  Polynomial  Chaos  approach.  We  use  this  technique  to 
filter  the  noisy  information  gathered  in  the  process  of  system  monitoring  and  to  discover  the  true  structure 
of  the  system  (e.g.,  covert  network).  The  theoretical  and  numerical  foundation  of  the  proposed  methodology 
were  described  in  our  recent  papers. 

The  results  of  this  research  is  published  by  Proceedings  of  the  National  Academy  of  Science,  USA 
(see  the  paper  “A  new  stochastic  modeling  methodology  based  on  weighted  Wiener  chaos  and  Malliavin 
calculus”  by  Wan,  Rozovskii,  and  Karniadakis  [182]).  A  more  general  setting  was  considered  in  the  paper 
“Elliptic  equations  of  higher  stochastic  order”  by  Lototsky,  Rozovskii,  and  Wan  [100].  The  mathematical 
foundations  for  these  developments  are  documented  in  “A  unified  approach  to  stochastic  evolution  equations 
using  the  Skorokhod  integral”  by  Lototsky  and  Rozovskii  [98]  and  “Stochastic  differential  equations  driven 
by  purely  spatial  noise”  by  Lototsky  and  Rozovskii  [99], 
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6.  Conclusion 

Our  results  prove  that  optimal  spatiotemporal  NLF  methods  allow  for  detection  and  reliable  tracking  of 
small  dim  objects  when  standard  tracking  methods  fail  (see  also  Chapter  13). 

The  developed  novel  approach  to  uncertainty  quantification  can  be  applied  to  covert  terrorist  networks 
to  recover  their  true  structures. 

Current  technology  of  precision  positioning/navigation  of  aircraft,  tracks,  ships  and  other  moving  plat¬ 
forms  critically  depends  on  availability  of  satellite  guidance  (GPS,  etc.).  Major  terror  acts  or  war  could 
disrupt  satellite  based  navigation.  Also,  some  covert  Army  operation  might  require  electronic  silence.  If 
this  is  the  case,  the  standard  satellite  based  navigation  must  be  ruled  out.  In  contrast,  the  NLF  based  algo¬ 
rithms  are  fully  autonomous  in  that  they  do  not  require  satellite  based  inputs.  Instead,  they  utilize  recently 
developed  electronic  Geographic  Information  System  (GIS)  which  is  available  in  the  on-hoard  version. 
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Chapter  3 


Target  Tracking  Concepts  for  Distributed 
Targets  with  Unknown  Shapes:  Multiple 
Target  Tracking  and  Tag  &  Track 

The  material  of  this  chapter  is  based  on  the  results  of  the  group  of  Dr.  Medioni,  USC  (Sections  1-9)  and  Dr. 
Chan,  UCLA  (Section  10). 

1.  Introduction 

Extended  target  tracking  is  a  fundamental  problem  in  video  surveillance,  as  it  provides  the  description  of 
spatio-temporal  relationships  between  observation  and  targets  in  the  scene.  As  we  discuss  above,  there  arc 
two  core  subtasks  that  we  aim  to  address,  one  is  Multiple  Target  Tracking  and  the  other  is  Tag-and-Track. 
The  two  subtasks  both  have  wide  application  areas  in  visual  surveillance  and  content-based  video  analysis. 

Detecting  and  tracking  multiple  moving  objects  from  a  moving  platform,  e.g.  moving  objects  from  an 
airborne  camera,  presents  the  following  challenges.  As  the  size  of  objects  is  relatively  small  from  an  airborne 
view,  appearance  based  detectors  suffer  from  lack  of  resolution  and  blurry  images.  On  the  other  hand,  the 
motion-based  object  detection  approach  relies  on  the  stabilization  of  the  parametric  camera  motion  model 
(affine  or  homography).  Moving  objects  arc  defined  as  the  areas  that  have  not  been  stabilized.  This  method 
works  well  when  the  scene  can  be  considered  planar,  or  when  the  motion  of  the  camera  is  pan/tilt/zoom. 
However,  3D  depth  in  the  scene  produces  pixel  displacement,  which  cannot  be  accounted  for  by  the  global 
parametric  model,  usually  termed  as  parallax.  Other  difficult  cases  affect  detection  and  tracking  in  airborne 
videos,  such  as  abrupt  illumination  changes,  registration  errors  and  occlusions.  Many  approaches  have 
been  proposed  to  improve  motion  detection  and  tracking  on  frame-by-frame  and  pixel-by-pixel  bases,  e.g. 
global  illumination  compensation  [190],  parallax  filtering  [195],  or  detection  using  contextual  information 
[68,  188].  No  much  attention  has  been  paid  on  analyzing  the  long-term  motion  pattern  of  moving  objects, 
which  is  a  distinctive  property  for  moving  vehicles  in  airborne  videos.  Conceptually  similar  to  “track-before  - 
detect”  techniques,  we  aim  to  involve  temporal  information  in  process  as  early  as  possible.  Indeed,  detection 
and  tracking  are  coupled:  if  perfect  detection  is  given,  tracking  becomes  relatively  straightforward,  on  the 
other  hand,  if  we  know  the  motion  and  trajectory  of  an  object,  detection  is  easier.  However,  surveillance 
videos  are  often  of  low  quality,  and  the  size  of  moving  objects  is  relatively  small  from  an  airborne  view, 
so  the  lack  of  resolution  makes  it  difficult  or  sometimes  even  impossible  for  appearance  based  detectors  to 
work  in  complex  scenes. 

Another  important  subtask  is  to  track  a  single  object  of  interest.  This  subtask  of  tracking  does  not 
require  the  compensation  of  camera  motion  and  doest  not  require  any  prior  knowledge  of  an  object  of 
interest.  The  object  of  interest  can  be  any  object  that  is  specified  by  a  user  during  the  initialization,  e.g. 
a  person,  a  vehicle,  or  an  airplane,  etc.  The  essential  problem  in  this  sub-task  is  to  online  establish  an 
appearance  model  to  describe  the  appearance  of  an  extended  target  and  update  the  model  on  the  fly  to  adapt 
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to  appearance  changes,  which  can  be  caused  by  varying  viewpoints  and  illumination  conditions.  Appearance 
can  also  change  relative  to  background  due  to  the  emergence  of  clutter  and  distracters.  Also,  an  object  may 
leave  the  field  of  view  (or  be  occluded)  and  reappear.  Another  critical  challenge  is  when  there  arc  some 
objects  similar  to  our  target  appealing  in  the  frame,  which  makes  it  hard  to  model  the  appearance  of  the 
target  to  distinguish  it  from  the  others.  While  the  generative  models  easily  fail  to  separate  them  due  to 
the  lacks  of  discriminative  power,  the  discriminative  models  lead  to  over-fitting  problem  when  frying  to. 
Moreover,  real-time  performance  in  this  subtask  is  an  important  and  challenging  requirement. 

The  main  objective  of  this  work  is  to  provide  the  capability  to  track  a  single  or  multiple  extended  targets 
from  a  moving  platform,  e.g.  from  an  airborne  camera.  Tracking  is  a  critical  component  of  video  surveil¬ 
lance,  as  it  provides  the  description  of  spatio-temporal  relationships  between  observations  and  targets  in  the 
scene  required  by  activity  recognition  modules  for  surveillance  purpose.  There  arc  two  subtasks  that  we  aim 
to  deal  with:  1)  Multiple  Target  Tracking  2)  Tag-and-Track. 

For  multiple  target  tracking,  we  proposed  a  general  framework  which  makes  use  of  spatio-temporal 
consistency  in  both  motion  and  appearance  and  does  not  require  the  one-to-one  mapping  between  observa¬ 
tions  and  targets.  Inferring  the  association  and  targets  states  according  to  current  observations  essentially 
suffers  from  the  ambiguity  existing  in  data  association  [32,  194].  Our  method  works  under  a  deferred  logic 
framework,  where  the  decision  is  made  when  enough  observations  arc  obtained.  In  order  to  deal  with  the 
high  computational  complexity  of  such  an  association  scheme,  a  Data-Driven  Markov  Chain  Monte  Carlo 
(DD-MCMC)  [177]  method  is  proposed  to  sample  the  solution  space.  Both  spatial  and  temporal  association 
samples  arc  incorporated  into  the  Markov  chains  transitions.  We  also  proposed  a  tracklet-based  approach 
and  tried  to  overcome  some  of  the  disadvantages  of  existing  methods.  It  does  not  require  an  exhaustive 
evaluation  of  data  association  hypotheses.  It  does  not  assume  one-one  mapping  between  observations  and 
objects,  and  provides  a  confidence  measure  on  each  tracklet.  The  algorithm  accomplishes  this  by  formu¬ 
lating  the  tracking  problem  as  inference  in  a  set  of  Bayesian  networks,  and  uses  consistency  of  motion  and 
appearance  as  the  driving  force.  The  computed  tracklets  arc  then  used  in  a  complete  multi-object  tracking 
algorithm.  Moreover,  we  proposed  to  analyze  the  motion  patterns  formed  by  moving  object  over  time,  which 
provides  a  distinctive  property  to  detect  single  or  multiple  moving  objects  in  a  spatio-temporal  volume.  We 
first  provide  a  straightforward  geometric  interpretation  of  a  general  motion  pattern  in  4D  space  (x.  y,  vx,vy), 
which  can  describe  a  large  amount  of  commonly  seen  2D  motion  patterns,  e.g.  traffic  at  a  busy  intersection, 
crowds  on  a  sidewalk.  We  propose  to  use  the  Tensor  Voting  computational  framework  to  detect  and  segment 
such  motion  patterns  in  4D  space.  Beyond  segmenting  motion  patterns,  we  apply  this  technique  to  facilitate 
the  detection  and  tracking  of  each  individual  object  in  such  a  motion  pattern. 

For  the  tag-and-track  problem,  in  order  to  track  and  reacquire  an  unknown  object  with  limited  labeling 
data,  we  propose  to  learn  these  appearance  variations  online  and  build  a  model  that  describes  all  seen  ap¬ 
pearance  while  tracking.  To  address  this  semi-supervised  learning  problem,  we  propose  a  co-training  based 
approach  to  continuously  label  incoming  data  and  online  update  a  hybrid  discriminative  generative  model. 
It  was  then  improved  in  running  time  by  using  a  co-trained  cascade  particle  filter  framework.  The  cascade 
manner  of  organizing  the  particle  filter  enables  the  efficient  evaluation  of  multiple  appearance  models  with 
different  computational  costs;  thus  improve  the  speed  of  the  tracker.  Also,  we  proposed  a  method  to  handle 
parti al  occlusion  which  is  a  critical  factor  causing  drift  in  tracking.  Moreover,  we  proposed  to  use  context 
information  to  enhance  the  performance  of  the  tracker  and  deploy  it  in  a  full  automatic  surveillance  sys¬ 
tem  where  a  pedestrian  is  detected,  and  an  active  camera  follows  him  to  acquire  the  high  resolution  face 
sequence. 

2.  Spatio-Temporal  Monte  Carlo  Markov  Chain  Data  Association 

Assume  that  there  arc  K  unknown  moving  targets  within  the  time  interval  [1,T].  Let  yt  denote  the  set  of 
foreground  regions  at  time  t,  Y  is  the  set  of  all  available  foreground  regions  within  [1,  T\.  Here,  we  define  the 
tracking  problem  as:  given  the  observation  Y,  infer  an  unknown  number  of  K  tracks  u  =  {to,  t\,  ■  ■  ■  ,  tk}, 
where  tq  is  the  set  of  false  alarms,  r>.  is  the  kth  track.  Each  77,.  in  uj  is  defined  as  a  sequence  of  shapes. 
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For  simplicity,  we  use  rectangles  to  represent  the  shapes  covering  the  foreground  regions.  In  the  case  of  a 
single  target  with  perfect  foreground  segmentation,  the  set  of  MBRs  ( Minimum  Bounding  Rectangles)  of 
each  foreground  region  at  different  times  forms  the  best  cover  of  the  target.  However,  when  inter-occlusion 
between  multiple  targets  and  noisy  foreground  segmentation  exist,  it  is  not  trivial  to  find  the  optimal  cover. 
In  our  framework,  the  tracking  problem  is  formulated  as  maximizing  a  posterior  (MAP)  of  a  cover  of  fore¬ 
ground  regions,  given  the  set  of  observations  Y,  c o*  =  arg  max(p(tc|y)).  By  introducing  the  concept  of 

cover,  we  overcome  the  one-to-one  assumption  at  each  time  instant,  one  foreground  region  can  be  covered 
by  more  than  one  target  and  one  target  can  cover  more  than  one  foreground  region  as  well. 

To  find  a  cover  with  reasonable  properties,  we  first  define  a  prior  model  which  considers  the  following 
criteria:  we  prefer  long  tracks  with  few  false  alarms.  In  addition,  one  track  should  have  little  overlap  with 
other  tracks.  We  adopt  the  prior  probability  of  a  cover  u  as  the  product  of  each  prior  terms.  Also,  we  consider 
a  probabilistic  framework  for  incorporating  two  parts  of  likelihoods:  motion  likelihood  Lm,  appearance 
likelihood  La-  We  represent  the  elements  (rectangles)  in  track  k  as  (rfc(ti),  Tfc(^2),  •  •  • ,  Tk(t\Tk\)),  where 
L  <G  [1,  T],  and  (U+i  —  i, )  ^  1,  since  missing  detection  may  happen.  Given  one  cover,  the  motion  and 
appearance  likelihood  of  a  target  is  assumed  to  be  independent  of  other  targets.  The  joint  likelihood  of  a 
cover  can  be  factorized  in  Eq.3. 1. 

p(Y\u)  =  f[  L{tK)  =  f[  II  L(Tk(ti+i)\Tk{ti)) 

k— 1  k= 1  *= 1  (Q  i\ 

K  frfc|  —  1 

=  n  n  Lm  (Tfc(ii+i)|rfc(fj))  La  (Tk(ti+i)\Tk(L)) 

fc= i  *= l 


With  some  manipulations,  we  combine  the  prior  and  the  likelihood  p(u\Y)  in  Eq.  3.1  to  have  the  whole 
posterior  represented  in  Eq.  3.2. 


p(uj\Y)  oc  exp{— Co-Sjen  -  CiK  -  C2F  -  C3Saip  -  C4Sapp  -  Smot} 


I< 


Slen  =  ~  E  M  ,Solp  =  E  r(0  )  Sapp  =  E  E  D(Tk{L) ,  Tk(ti+1)) 


K  Lk- 1 


\k=  1  /  \T=1 

Smot  =  E  E  (log  (det(Pi)1/2)  +  P~1ei) 

k—1  i—  1 


k=l  i—1 


(3.2) 


where  Cq,  •  •  •  ,  C4  are  positive  real  constants,  which  are  determined  automatically  by  Linear  Programming 
in  the  training  phase  [192].  Eq.  3.2  reveals  that  the  MAP  estimation  is  equivalent  to  finding  the  minimum  of 
an  energy  function.  The  tradeoff  between  prior  and  posterior  will  lead  to  a  MAP  solution. 

Searching  in  such  a  solution  space  for  Eq.3.2  is  not  trivial.  We  propose  to  use  a  data-driven  MCMC  to 
estimate  the  best  spatio-temporal  cover  of  foreground  regions.  To  ensure  that  detailed  balance  is  satisfied, 
the  Markov  chain  is  designed  to  be  ergodic  and  aperiodic.  It  is  also  important  to  design  samplers  that 
converge  quickly. 

In  our  proposal  distribution,  the  sampler  contains  two  types  of  moves:  temporal  and  spatial  moves. 
Temporal  moves  correspond  to  changing  the  labels  of  rectangles  at  different  time  instants,  while  spatial 
moves  change  covering  rectangles  at  one  time  instant. 

The  input  to  the  algorithm  is  the  set  of  original  foreground  Y,  initial  cover  ujq  and  the  total  number  of 
samples  nmc.  Each  move  is  sampled  according  to  its  own  prior  probability.  Note  that,  instead  of  keeping 
all  samples,  we  only  keep  the  cover  with  the  maximum  posterior  since  we  don’t  need  the  whole  distribution 
but  only  the  MAP  estimate.  For  the  same  reason,  there  is  no  burn-in  procedure.  The  rectangles  in  the 
initial  cover  ujq  arc  directly  obtained  from  MBRs  of  foreground  regions.  Given  the  stationary  distribution 
tt(uj)  =  p(uj\Y),  the  acceptance  ratio  A(u,u')  is  defined  as  follows. 


A(u>,  oj')  =  min 


7r(o/)g(u;|i/)\ 
’  7r(w)9(u/|w)  ) 


(3.3) 


To  demonstrate  the  concept  of  our  approach,  we  design  simulation  experiments.  In  a  L  x  L  square 
region  there  arc  K  ( unknown  number )  moving  discs.  Each  disc  presents  an  independent  color  appearance 
and  an  independent  constant  velocity  and  scale  change  in  the  2D  region.  False  alarms  (non-overlapping 
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Figure  3.1:  Simulation  result  L  =  200,  N  =  7,  FA  =  7  and  T  =  50.  Targets  may  split  or  merge  when  they 
appear. 


with  targets)  are  u.a.r  (uniform  at  random)  located  in  the  scene  and  the  number  of  false  alarms  is  an  uniform 
distribution  on  [0,  FA],  We  compare  the  tolerance  of  the  target  density  and  false  alarms  with  other  methods, 
including  a  JPDAF  [12]  based  method  from  [76],  the  MHT  from  [36]  and  our  own  algorithm  with  only 
temporal  moves.  For  each  different  setting,  we  generate  20  sequences  and  each  sequence  contains  T  =  50 
frames.  The  MCMC  sampler  was  run  for  a  total  of  10K  iterations  where  the  first  15%  iterations  consist  solely 
of  temporal  moves.  The  average  score  from  multiple  runs  of  our  method  is  reported.  All  four  methods 
employ  the  same  motion  and  appearance  likelihood.  Figure  3.2(a)  compares  the  performance  when  the 
number  of  targets  increases.  Figure  3.2(b)  shows  the  tolerance  to  false  alarms  for  different  methods.  Because 
we  consider  the  spatial  and  temporal  association  seamlessly,  our  method  is  able  to  handle  the  case  when  split 
or  merged  observations  exist. 
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Figure  3.2:  (a)STDA  as  the  function  of  N  the  maximum  number  of  targets,  (b)STDA  as  the  function  of  FA 
the  number  of  false  alarms 


In  summary,  we  proposed  a  framework  to  find  a  global  optimal  spatio-temporal  association  which  max¬ 
imizes  the  consistency  of  motion  and  appearance  of  targets  over  time.  Our  method  overcomes  problems 
encountered  with  one-to-one  mapping  between  observations  and  targets.  A  data  driven  MCMC  method  is 
used  to  sample  the  solution  space  efficiently  and  the  forward  and  backward  inferences  enhance  the  search 
performance.  Compared  to  other  data  association  algorithms,  the  proposed  method  shows  remarkable  im¬ 
provement  both  temporally  (i.e.  consistency  of  labels)  and  spatially  (i.e.  accuracy  of  outlined  regions).  The 
work  can  be  extended  along  the  following  lines:  first,  the  target  motion  model  can  be  extended  to  a  more 
general  model.  Second,  our  framework  can  naturally  incorporate  object  model  information  in  two  ways:  1) 
we  can  assign  a  model  likelihood  for  each  node  to  extend  our  likelihood  function.  2)  we  will  also  use  model 
information  to  drive  the  MCMC  proposal. 

To  track  from  a  moving  camera,  we  need  to  project  targets  at  different  times  into  a  common  reference 
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Figure  3.3:  Overview  of  the  geo-tracking  framework 


frame.  Accumulated  errors  are  introduced  when  fixed  coordinates  are  selected  and  no  further  alignment  is 
performed.  Usually  the  first  frame  [76]  or  the  ground  plane  in  the  first  frame  is  selected  as  the  reference 
frame.  Moreover,  due  to  scale  change,  image  coordinates  of  the  targets  are  not  meaningful.  Here,  we 
propose  to  use  a  global  map  (a  satellite  image)  as  the  reference  frame.  By  registering  UAV  (Unmanned 
Aerial  Vehicles)  images  with  the  satellite  image,  we  can  generate  the  absolute  geo-location  of  targets.  Also, 
tracking  is  performed  in  geo-coordinates,  which  have  clear  physical  meaning.  In  surveillance  applications, 
occlusion  is  common.  We  introduce  a  two-step  procedure  for  tracking  with  occlusion.  The  first  step  (called 
local  association)  links  detected  regions  within  a  sliding  window  and  generates  tracklets.  The  second  step 
(called  global  association)  links  the  tracklets  to  form  longer  tracks  and  maintain  tracks  ID. 


Figure  3.4:  Geo-mosaicing  2000  consecutive  frames  on  top  of  the  reference  frame. 

We  have  discussed  that  data  association  is  essential  for  successful  tracking.  We  propose  the  above 
MCMC  data  association  algorithm  to  deal  with  local  data  association  since  errors  in  local  association  are 
not  rectified  in  the  global  one.  We  formulate  the  local  association  as  multiple  targets  tracking,  in  which 
the  purpose  is  to  find  the  best  partition  of  observation  (i.e.  detected  moving  regions)  graph.  In  the  global 
association,  by  assuming  the  maximum  speed  and  acceleration  of  targets  on  the  geo-coordinates,  we  can 
define  the  compatibility  of  tracklets  and  this  reduces  ambiguity  in  tracklet  association.  In  addition,  we  adopt 
rotation  invariant  appearance  descriptors  [77]  to  represent  both  color  and  shape  distribution  of  targets  in 
each  tracklets.  The  overview  of  the  tracking  framework  is  shown  in  Figure  3.3. 

The  geo-registration  result  is  shown  in  Figure  3.4.  Figure  3.5  shows  the  tracking  result  on  the  sequence 
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with  multiple  moving  targets.  Again  when  targets  are  occluded  by  shadows,  local  data  association  may 
lose  the  track  identification  and  thus  tracklets  are  formed.  The  missing  detection  caused  by  occlusion  even 
lasts  for  longer  than  the  sliding  window  of  local  data  association  (45  frames).  However  in  global  data 
association,  the  tracklets  are  associated  with  correct  ID  throughout  the  video.  The  different  tracks  are  listed 
in  the  Z  direction  in  different  colors.  Figure  3.5(b),  3.5(c),  3.5(d)  and  3.5(e)  show  the  beginning  frame  of 
the  tracklets  of  the  red  truck.  Although  the  appearance  of  the  white  van  and  the  white  SUV  in  3.5(b)  is  quite 
similar,  the  temporal  and  spatial  constraint  on  the  global  map  prevents  from  associating  them  together. 
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(a)  The  tracking  result  with  geo-mosaicing  the  UAV  images  on  the  satellite  image 


(b)  (c)  (d)  (e) 


Figure  3.5:  The  tracklets  and  tracks  obtained  using  the  local  and  global  data  association  framework.  The 
UAV  image  sequence  is  overlayed  on  top  of  the  satellite  image. 


3.  Inferring  Tracklets  for  Multi-Object  Tracking 

One  of  the  keys  to  the  success  of  our  proposed  DD-MCMC  approach  before  is  an  efficient  exploration 
of  the  search  space.  It  tries  to  make  more  informative  moves,  and  thus  are  effective  enough  to  be  used 
in  practice. However,  when  being  applied  into  aerial  context,  where  there  are  a  number  of  object  detected, 
MCMC-based  approach  is  overwhelmed  because  of  the  need  of  an  immense  number  of  samples  to  converge 
to  a  good  solution.  In  addition,  the  low  frame -rate  causes  many  dificulties  in  just  initializing  the  set  of  tracks. 
With  bad  initialization  and  a  large  number  of  detections,  the  problem  becomes  intractable  for  the  MCMC- 
based  approach.  To  address  this  issue,  we  proposed  a  tracklet-based  approach  and  tried  to  overcome  some 
of  the  disadvantages  of  existing  methods.  It  does  not  require  an  exhaustive  evaluation  of  data  association 
hypotheses.  Also  it  is  a  MAP  estimate,  rather  than  a  heuristic.  It  does  not  assume  one-one  mapping  between 
observations  and  objects,  and  provides  a  confidence  measure  on  each  tracklet.  The  algorithm  accomplishes 
this  by  formulating  the  tracking  problem  as  inference  in  a  set  of  Bayesian  networks,  and  uses  consistency  of 
motion  and  appearance  as  the  driving  force.  The  computed  tracklets  are  then  used  in  a  complete  multi-object 
tracking  algorithm. 

The  goal  of  our  algorithm  is  to  infer  tracklets,  each  representing  one  object,  over  a  (sliding)  window  of 
frames.  This  window  is  usually  4-8  seconds  in  length.  The  input  to  our  algorithm  is  a  set  of  object  detections 
(blobs)  in  each  frame.  These  can  be  as  simple  as  connected  components  taken  directly  from  background 
subtraction,  or  they  can  be  the  output  of  a  more  complex  object  detector.  Each  object  detection  also  has 
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Figure  3.6:  The  role  of  tracklets  in  multi-object  tracking. 


an  associated  appearance  representation,  such  as  the  raw  image  patch,  or  a  histogram.  We  would  like  to 
emphasize  that  our  goal  is  to  find  valid  tracklets  within  a  window  that  shifts  with  each  frame.  Aggregation 
of  these  tracklets  into  tracks  that  span  several  windows  is  done  by  the  (higher-level)  tracking  algorithm. 
Also  if  the  detections  of  an  object  become  split  (or  merged)  for  a  period  longer  than  the  window  size,  this 
algorithm  will  find  several  (or  one)  tracklets  in  the  window.  This  must  be  handled  at  the  higher-level  as 
well.  Details  of  a  multi-object  tracker  that  takes  care  of  these  issues  is  not  given  here,  but  the  current 
implementation  is  simple.  A  flow-chart  that  clarifies  multi-object  tracking  using  tracklets  is  in  Figure  3.6. 

We  do  not  assume  an  a  priori  number  of  objects  in  the  scene,  and  the  number  can  vary  over  time.  As  a 
result,  we  assume  that  each  detection  in  the  first  frame  of  the  window  is  a  potential  object.  Therefore,  we 
find  an  optimal  tracklet,  or  a  set  of  tracklets,  starting  at  each  detection  in  the  first  window  frame.  This  is  not 
a  problem,  because  for  detections  that  arc  false  alarms,  the  model  of  a  valid  tracklet  (consistency  of  motion 
and  appearance)  is  not  satisfied,  and  the  tracklet  is  discarded.  Tracklets  that  start  in  the  second  or  later  frame 
of  the  window  are  found  when  the  sliding  window  shifts  to  that  frame. 

3.1.  Problem  Formulation 

If  the  initial  detection  of  an  object  is  given  to  us,  we  know  there  must  be  another  detected  instance  of  that 
object  located  “nearby”  in  subsequent  frames.  We  arc  assuming  there  arc  no  missed  detections  (due  to 
occlusion  or  else)  for  now,  but  split  detections,  or  split-merge  detections,  do  not  pose  a  problem,  and  this 
statement  still  holds.  Therefore,  the  optimal  tracklet,  or  a  set  of  tracklets,  that  we  want  to  find  must  be 
composed  of  a  series  of  “nearby”  detections.  This  can  be  expressed  in  a  detection  tree.  For  a  window  size 
of  T  frames,  this  tree  would  have  T  levels.  A  node  in  level  t  has  links  to  those  nodes  in  level  t  +  1,  which 
are  “nearby.”  The  root  of  the  tree,  t  =  0,  represents  the  initial  detection. 

The  number  of  possible  tracklets  and  tracklet  combinations  arising  from  this  detection  free  is  huge,  and 
we  certainly  do  not  want  to  evaluate  every  hypothesis.  Instead,  we  realize  that  every  such  hypothesis  is 
making  a  decision  about  including  or  not  including  each  detection.  In  other  words,  this  is  just  a  binary 
labeling,  or  segmentation  problem.  The  valid  detections  need  to  be  separated  from  the  invalid  detections. 
The  invalid  detections  are  detections  due  to  noise,  or  due  to  objects  other  than  the  one  that  generated  the 
initial  detection.  One  consequence  of  this  view  is  that  given  the  valid  detections,  it  is  not  always  known 
which  tracklets  generated  them.  For  example,  when  there  are  multiple  valid  detections  in  several  window 
frames,  it  could  be  that  a  single  object  generated  them  (and  they  correspond  to  multiple  split/merge  events), 
or  it  could  be  two  (or  several)  objects  being  very  close  to  each  other.  It  may  seem  that  nothing  was  gained 
by  the  segmentation,  but  actually  solving  this  problem  is  easier  than  before,  because  the  search  space  is 
significantly  reduced. 

There  are  several  ways  that  the  segmentation  problem  can  be  solved.  A  way  that  we  pursue  here  (without 
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further  detail)  is  to  determine  the  labeling  in  a  probabilistic  framework. 

3.2.  Tracklets  from  Detections 

One  way  to  solve  this  problem  is  to  generate  possible  tracklets  (hypotheses),  and  find  zero  or  more  that  best 
explain  the  detections.  Note  that  the  search  space  is  significantly  reduced  than  before  the  segmentation,  as 
we  only  need  to  explain  the  valid  detections.  Quite  often  there  will  be  only  one  hypothesis. 

The  possible  tracklets  are  generated  by  following  certain  parent  pointers  up  to  the  root  of  the  tree  from 
each  valid  detection,  and  removing  any  tracklet  that  is  a  prefix  of  another.  To  determine  which  (combination) 
of  these  best  explains  the  detections  we  first  remove  those  tracklets  that  do  not  satisfy  certain  criteria.  These 
criteria  are: 

•  the  number  of  detections  in  the  temporal  window  must  be  at  least  half  the  window  size 

•  the  average  acceleration  of  the  object  must  be  less  than  a  threshold  («  6  m/s2) 

•  the  object  must  be  undergoing  a  smooth  motion. 

The  second  step  is  to  merge  tracklets  that  are  a  result  of  split-merge  events.  This  is  done  by  repeatedly 
merging  tracklets  that  have  similar  appearance  and  motion. 

3.3.  Occlusion  Handling 

So  far  in  the  discussion  we  have  assumed  there  is  no  occlusion  or  missing  data.lt  turns  out  that  when  an 
object  is  occluded  but  the  occluder  is  detected,  the  algorithm  as  presented  still  works.  This  is  because  the 
detection  tree  does  not  really  change,  except  that  no  detections  in  the  frame  where  the  object  is  occluded 
will  be  valid.  A  tracklet  is  still  found,  provided  that  the  object  is  not  occluded  for  most  of  the  window.  In 
that  case,  the  tracklet  would  fail  the  first  criterion  above. 

The  real  problem  that  needs  to  be  handled  is  missing  detections.  When  there  is  a  missed  detection,  the 
detection  tree  will  be  shorter  than  T.  If  it  is  too  short,  any  tracklets  that  are  found  will  not  have  enough 
detections  and  fail  the  first  criterion  above.  This  problem  is  solved  by  adding  “virtual  detections”  to  the 
detection  tree.  These  are  added  whenever  a  detection  in  frame  t  has  no  nearby  detections  in  frame  t  + 1.  The 
position  of  this  virtual  detection  is  estimated  using  the  motion  model,  and  the  appearance  is  copied  from  the 
(detected)  parent.  This  procedure  is  recursive,  so  that  when  a  newly  added  virtual  detection  does  not  have 
nearby  detections  in  the  next  frame,  the  process  is  repeated. 

3.4.  Experimental  Results 

We  have  evaluated  the  multi-object  tracker  on  sequences  captured  from  an  airborne  sensor.  The  sequences 
come  from  the  publicly  available  CLIF  2006  dataset  [1],  The  video  is  captured  at  roughly  2  Hz,  and  it  is 
in  grayscale.  As  this  is  a  large  format  video  roughly  6600x7500  pixels,  we  chose  640x480  subregions  over 
an  expressway  for  the  puiposes  of  evaluation.  The  sequences  were  stabilized  prior  to  tracking.  All  of  the 
sequences  used  in  evaluation  are  available  for  download. 

The  only  moving  objects  in  the  video  are  vehicles,  but  they  are  in  very  low  resolution.  Each  vehicle 
is  only  about  7x7  pixels,  which  makes  detection  and  tracking  quite  challenging.  Since  this  low  resolution 
gives  very  limited  appearance  information,  we  used  a  simple  sum  of  squared  differences  function  as  our 
appearance  similarity.  This  is  computed  after  doing  a  least-squares  alignment  of  the  two  image  patches.  The 
alignment  is  parameterized  by  translation  and  rotation.  In  this  context,  this  parameterization  is  satisfactory. 

The  moving  object  detection  was  done  using  background  subtraction.  The  background  is  modeled  as 
the  mode  of  a  (stabilized)  sliding  window  of  frames.  We  have  also  tried  a  mixture  of  Gaussian  model,  but 
we  did  not  see  a  significant  difference.  A  window  size  of  16  frames,  corresponding  to  about  8  seconds  of 
video  was  used. 

Ground  truth  tracks  were  manually  generated  for  an  80-frame  sequence  containing  168  vehicles.  Tracks 
shorter  than  the  window  size  were  not  used  in  evaluation  (for  a  window  size  of  16,  this  left  123  tracks). 
Several  metrics  were  used  to  measure  performance:  object  detection  rate  (ODR),  false  alarm  rate  (FAR), 
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ODR 

FAR 

NTF 

IDC 

With  virtual  det. 

0.72 

0.04 

1.01 

0.84 

No  virtual  det. 

0.61 

0.03 

1.04 

0.90 

Table  3.1:  Effect  of  virtual  detections  on  tracking  performance. 


Window  Sz. 

ODR 

FAR 

NTF 

IDC 

10 

0.76 

0.04 

1.06 

0.86 

12 

0.73 

0.04 

1.00 

0.85 

14 

0.72 

0.04 

1.00 

0.87 

16 

0.72 

0.04 

1.01 

0.84 

18 

0.71 

0.05 

1.00 

0.83 

20 

0.63 

0.04 

1.00 

0.89 

Table  3.2:  Effect  of  sliding  window  size  on  tracking  performance. 


normalized  track  fragmentation  (NTF),  and  ID  consistency  (IDC).  The  definition  of  the  ODR  and  NTF  is 
the  same  as  in  [121].  False  alarm  rate  is  the  number  of  false  detections  divided  by  the  number  of  total 
detections  (NOT  computed  per  track  and  averaged).  The  last  metric  is  introduced  to  measure  the  tendency 
of  the  tracks  to  “jump”  or  switch  IDs.  To  calculate  this  metric,  we  first  label  each  detection  in  a  ground  truth 
track  with  the  ID(s)  of  a  test  track(s),  if  any,  which  contains  an  overlapping  detection.  The  ID  consistency 
measures  the  largest  fraction  of  detections  in  a  ground  truth  track  labeled  with  one  label.  Just  as  NTF,  it  is 
weighted  by  the  length  of  the  track  to  avoid  favoring  short  tracks.  More  formally,  let  G  =  {g,}  be  the  set  of 
ground  truth  tracks,  let  g\  =  {gij}  be  the  set  of  detections  in  each  track,  and  Ifig-g)  be  the  label  of  the  test 
track  associated  with  the  detection  gtJ .  The  ID  consistency  of  a  track  g,  is 


IDC,  =  max 

i 


I  {gij  s-t.  L(gij)  —  /} 


\9i  I 


(3.4) 


and  the  overall  ID  consistency  is 


IDC 


1 

Thgi  1 9i 


bilroc  <  • 

9i 


(3.5) 


The  best  ID  consistency  is  1. 

Quantitative  results  using  these  metrics  are  shown  in  Tables  3.1  and  3.2.  The  first  table  shows  the  effect 
of  using  virtual  detections  in  tracking.  The  second  table  shows  the  effect  of  different  sliding  window  sizes 
on  tracking.  Qualitative  results  arc  shown  in  the  following  figures.  Figure  3.7  shows  tracking  results  on 
sample  frames  of  the  sequence  used  for  evaluation. 


4.  Motion  Pattern  Analysis  and  Its  Application  to  Detecting  and  Tracking  Objects  on  a  Mov¬ 
ing  Platform 

We  first  address  the  general  motion  pattern  analysis,  and  then  discuss  the  specific  property  of  the  motion 
pattern  created  by  moving  vehicles  in  airborne  videos. 

Consider  a  2D  point  P  smoothly  traversing  in  a  spatio-temporal  space.  By  projecting  the  motion  of  the 
point  in  a  4D  space,  (x,y,vx,vy),  where  (x,y)  is  the  location  of  the  point  and  (■ vx,vy )  denotes  the  time 
derivatives  of  motion  along  x  and  y  axes,  we  obtain  a.  fiber  (dimensionality  is  one)  in  the  4D  space  that 
represents  the  motion  characteristic  of  that  point.  If  a  set  of  2D  points  (e.g.  on  the  same  object)  arc  moving 
in  a  similar  way,  a  bundle  of  fibers  form  a  flow.  Many  types  of  object  motions  can  be  represented  by  a  flow. 
A  single  moving  vehicle  or  a  convoy  of  vehicles  observed  from  an  airborne  camera  arc  such  typical  cases. 
Generally,  we  define  the  motion  pattern  in  the  4D  space  as  a  set  of  motion  vectors, 

T  =  {(x,y,vx,vy),  (x,y)  <E  R2}.  (3.6) 
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Figure  3.7:  Tracking  results  on  sequence  1.  A  green  box  denotes  a  real  detection,  whereas  a  yellow  box 
denotes  an  interpolated  detection. 


(a)  Frame  49  (b)  Frame  5 1  (c)  Frame  54 


Figure  3.8:  Example  of  occlusion  handling. 


Without  loss  of  generality,  in  one  motion  pattern,  one  motion  vector  ( vx,vy )  is  assigned  at  one  location 
(x,y),  i.e.  (vx,  vy  )  is  a  function  of  (x,y),  (vx.  vy  )  =  Fix,  y).  Note  that  there  may  exist  multiple  motion 
patterns  at  the  same  location,  ( e.g .  at  a  road  intersection).  Objects  whose  motion  complies  with  the  same 
motion  pattern  are  called  objects  moving  in  the  same  motion  context.  A  motion  flow  can  be  regarded  as  one 
particular  type  of  motion  pattern. 

The  motion  pattern  defined  in  Eq.3.6  essentially  describes  the  general  motion  characteristics  of  objects 
over  a  period  of  time.  In  practice,  the  motion  estimation  of  one  object  at  a  time  inevitably  contains  noise. 
The  estimated  motion  vectors  in  a  motion  pattern  T  can  be  written  as  T  =  { (x,  y.  fix,  y )  +  e) },  where  e 
accounts  for  the  noise  in  motion  estimation.  We  aim  to  analyze  the  general  motion  pattern  from  multiple 
noisy  motion  vectors  over  time,  and  then  use  this  information  to  facilitate  detection  and  tracking  of  each 
object  in  the  motion  pattern. 

The  essential  property  of  a  motion  pattern  is  that  each  smooth  motion  pattern  corresponds  to  a  smooth 
sheet  in  the  4D  space,  i.e.  the  local  dimensionality  is  2.  It  is  easy  to  know  that  the  dimensionality  is  2,  since 

•  the  projection  of  a  motion  pattern  in  (x,  y )  space  is  2  in  non-degenerate  cases,  thus  the  dimensionality 
in  the  4D  space  is  no  less  than  two; 

•  at  most  one  motion  vector  is  assigned  at  one  location,  thus  the  local  dimensionality  is  no  more  than 
two. 

According  to  the  smooth  motion  assumption,  the  normal  on  the  sheet  created  by  one  motion  pattern  changes 
smoothly  and  different  motion  patterns  produce  discontinuity  between  them  in  4D  space.  Noise  caused  by 
erroneous  optical  flows  do  not  form  a  coherent  sheet  with  local  smoothness. 

Suppose  we  have  a  set  of  noisy  input  samples  in  the  4D  space,  we  aim  to  find  smooth  2D  sheets  in  this 
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space.  In  order  to  analyzing  the  normal  and  tangent  space  at  each  point  to  infer  the  geometric  structure  while 
filtering  noise  out,  we  adopt  Tensor  Voting  to  achieve  this  task. 

Tensor  Voting  [161]  can  be  regarded  as  an  unsupervised  computational  framework  to  estimate  local 
geometric  information.  Tensor  voting  has  been  proved  capable  of  estimating  structures  in  N-D  space  with 
very  noisy  input  data.  In  the  Tensor  Voting  framework,  the  local  geometric  information  at  one  point  in 
N-D  space  is  encoded  in  a  symmetric,  nonnegative  definite  matrix.  The  local  geometry  can  be  derived  by 
examining  its  eigensystem.  Recall  that  a  tensor  can  be  decomposed  as 

N  N—l  i  N 

T  =  XleleJ  =  ^  (A*  -  Ai+i)  ^  ekel  +  Xn  ^  etej  (3.7) 

i= 1  i= 1  fc=l  k= 1 

where  {A*}  arc  the  eigenvalues  arranged  in  descending  order,  { et }  arc  the  corresponding  eigenvectors,  and 
N  is  the  dimensionality  of  the  input  space.  The  decomposition  in  Eq.3.7  provides  a  way  to  interpret  the  local 
geometry.  The  largest  gap  between  two  consecutive  eigenvalues,  \,  —  Ai+i,  indicates  the  dimensionality  d, 

d  =  argmax(A*  —  Aj+i)  (3.8) 

i 

The  largest  difference  value  Xr[  —  Ar/+i  is  the  saliency  of  the  dimensionality.  The  corresponding  eigenvectors 
{ei, ...,  ed }  span  the  normal  space  of  the  structure,  and  e^+i, ...,  ejv  span  the  tangent  space.  In  our  case,  we 
are  interested  in  the  structures  whose  normal  space’s  dimensionality  is  2  in  the  4D  space.  Given  the  input 
data,  a  set  of  4D  motion  vectors,  {/,  },  we  encode  each  sample  as  a  ball  tensor,  which  indicates  no  orientation 
as  at  the  beginning  we  have  no  knowledge  of  the  local  structure  at  a  point.  Each  /',  receives  a  vote 
from  its  neighbors  fj  in  4D  space.  The  voting  result  at  one  point,  which  indicates  its  geometric  property, 
is  obtained  by  adding  up  all  the  incoming  votes  from  its  neighbors.  The  vote  from  a  voter  f,  to  a  receiver 
fj  encodes  the  tensor  at  fa,  the  orientation  and  the  distance  from  j)  to  fj.  The  result  of  this  process  can  be 
interpreted  as  a  local,  nonparametric  estimation  of  the  geometric  structure  at  each  sample  position.  After 
accumulating  all  cast  tensors,  the  local  geometry  can  be  interpreted  according  to  of  Eq.3.8. 

To  detect  and  segment  motion  patterns,  we  take  original  optical  flows  computed  across  multiple  frames 
as  input,  without  any  pruning  or  clustering.  After  the  voting  process,  we  examine  the  cast  tensor  T  and  keep 
the  structures  of  dimensionality  2  with  saliency  larger  than  a  threshold.  After  this  tensor  voting  process, 
most  of  the  structures  created  by  noise  arc  filtered  out.  There  may  exist  multiple  motion  vectors  that  belong 
to  the  same  motion  pattern  over  time,  we  use  the  average  of  the  motion  vectors  to  represent  the  estimated 
motion  pattern.  Note  that,  we  only  average  motion  vectors  on  the  same  sheet  in  the  4D  space.  This  averaging 
is  essentially  different  with  prefiltering  as  it  is  performed  after  we  have  the  knowledge  of  local  structures. 
After  detecting  the  desired  structure  and  filtering  out  the  noise,  we  use  a  flood-till  algorithm  in  4D  space  to 
segment  each  motion  pattern.  According  to  the  smooth  motion  assumption,  the  sheet  formed  by  one  motion 
context  has  local  smoothness  and  discontinuity  exists  between  sheets  caused  by  different  motion  patterns. 
The  neighbor  samples  in  4D  space  that  have  similar  normal  arc  assigned  the  same  label.  We  use  principal 
angles  [23]  to  measure  the  similarity  between  two  normal  spaces.  Two  examples  of  motion  pattern  arc 
shown  in  Figure  3.9.  The  video  used  in  the  second  example  is  at  a  road  intersection,  where  exist  multiple 
motion  patterns  at  some  location.  Directly  smoothing  in  the  (x,  y)  space  will  destroy  such  motion  patterns. 

We  have  discussed  the  properties  of  general  motion  patterns  in  4D  space.  Now,  we  analyze  motion 
patterns  in  airborne  videos,  where  we  aim  to  find  those  created  by  moving  vehicles.  The  essential  difference 
between  motion  patterns  created  by  parallax  and  by  moving  vehicles  is  as  follows.  After  principal  motion 
estimation,  the  motion  pattern  of  moving  vehicles  is  generated  by  the  intrinsic  properties  of  the  objects  and 
static  environmental  constraints  on  the  ground  plane  (e.g.  road,  or  non-road  area),  which  is  independent  of 
camera  motion.  The  motion  pattern  of  parallax  is  caused  by  the  camera  motion,  as  each  motion  vector  on  a 
3D  structure  should  be  along  the  cpi polar  line  that  is  determined  by  camera’s  translation.  According  to  the 
relative  affine  structure  [143],  the  projection  pt  of  a  3D  point  P  on  It  can  be  decomposed  as, 

Pt  =  At,rPr  (3.9) 

(1)  (2) 
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Figure  3.9:  One  example  of  motion  pattern  segmentation  (a)(d)  scenes  with  traffic  flows  (b)(e)  motion 
pattern  shown  in  (x,y,vx)  space  (c)(f)  segmentation  of  motion  patterns 


where  pr  is  the  projection  of  P  in  the  reference  frame.  A;  is  a  scalar,  which  is  independent  of  the  camera 
pose  at  time  t,  and  et  is  the  epipole  at  time  t.  The  first  term  in  Eq.3.9  is  compensated  for  by  the  affine 
motion.  From  the  second  term,  we  can  see  the  motion  of  parallax  (ket)  is  indeed  determined  by  the  camera 
motion.  Interestingly,  when  the  epipole  is  moving  in  a  non-smooth  way,  the  motion  of  parallax  cannot  form 
smooth  patterns,  thus  non-smooth  epipole  motion  actually  helps  us  to  remove  parallax.  When  the  camera 
is  moving  in  a  smooth  way,  however,  the  parallax  can  still  form  a  smooth  motion  pattern.  Specifically, 
in  airborne  videos,  the  motion  patterns  of  moving  vehicles  forms  flows.  Such  flow  shows  a  fiber  property 
(dimensionality  is  1)  on  a  larger  scale.  This  property  is  due  to  the  fact  that  the  motion  range  of  a  vehicle  over 
time  is  much  larger  than  its  2D  dimensions.  In  order  to  examine  the  geometric  property  at  a  larger  scale, 
we  can  simply  enlarge  the  voting  scale.  In  our  experiments,  we  observe  that,  when  we  enlarge  the  scale  of 
voting,  the  motion  field  caused  by  a  small  3D  structure  becomes  a  point  tensor  or  it  remains  a  sheet  for  a 
large  3D  structure.  Thus,  the  procedure  of  segmenting  the  motion  field  created  by  moving  vehicles  is:  first 
vote  at  a  small  scale  and  keep  only  the  2D  structures  to  remove  noise,  and  then  vote  at  a  large  scale,  keep 
only  the  ID  fiber  structures.  In  practice,  instead  of  directly  enlarging  the  voting  scale,  we  down-sample  the 
4D  space  to  achieve  an  efficient  implementation. 

The  motion  flow  created  by  moving  vehicles  may  be  fragmented  due  to  occlusion.  Thus,  we  propose  a 
method  to  stitch  them  up.  After  we  find  each  flow,  we  randomly  place  several  “floats”  (square  2D  Gaussian 
kernels)  in  the  flow  and  apply  the  meanshift  like  method  used  in  along  both  positive  and  negative  directions, 
the  “floats”  terminate  at  the  ends  of  the  flow.  After  we  have  the  ends  (both  entry  and  exit)  of  the  flows,  we 
use  a  vote-casting  method  inspired  from  Tensor  Voting  to  calculate  the  motion  consistency  between  flows. 

Given  the  flow  information,  detection  and  tracking  becomes  much  easier.  First,  most  of  the  residual 
pixels  caused  by  noise  and  parallax  have  been  filtered  out.  Second,  the  local  dynamics  in  the  motion  field 
are  known.  In  residual  images  along  the  flows,  we  adopt  the  motion  history  image  method  to  segment 
independent  motion  regions.  Each  segmented  region  is  represented  as  an  oriented  rectangle.  An  association 
score  between  regions  Iij  and  R:J  from  neighboring  (\i  —  j\  ^  5)  frames  encodes  both  appearance  similarity 
and  consistency  with  the  local  motion  field  as: 

\R(i)-Rj(i)\  +  \RU)-RiU)\ 

Pij  =  CSije  2  (3.10) 

The  appearance  similarity  is  simply  the  normalized  cross  correlation  between  two  image  patches,  Rj{i) 
is  the  predicted  location  from  j  to  i  by  using  j i  —  j  steps  of  mean  shift  (the  direction  is  sign(i  —  j))  in  the 
motion  field.  According  to  this  similarity  measure,  we  aggregate  these  isolated  regions  from  different  frames 
into  tracklets.  We  further  filter  out  isolated  regions  and  very  short  tracklets  that  come  from  noisy  motion 
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segmentation.  For  a  pre-filtered  tracklet,  we  use  the  average  image  patch  of  the  oriented  rectangles  as  its 
appearance  template.  A  local  translation  relaxing  is  used  to  find  the  best  matching  location  for  averaging 
appearance  template.  The  motion  of  a  tracklet  is  encoded  in  its  start  and  end  points.  Then,  we  apply  the 
Hungarian  algorithm  to  associate  tracklets  into  tracks.  Here,  we  encode  the  entry  and  exit  information  of 
a  flow  in  the  utility  matrix  used  in  the  Hungarian  algorithm.  Suppose  there  are  n  tracklets  in  the  pool,  the 
utility  matrix  A‘2nx2n  is  a  matrix  of  size  2 n  x  2 n.  A(  j  ...  jn)x(i,...  ,nF  except  its  diagonal  elements,  contains 
the  similarity  between  any  pair  of  tracklets,  the  diagonal  of  A(n+\^...  j2n)x(i,-  ,n)  stores  the  termination 
probability  of  each  tracklet,  which  is  computed  according  to  the  distance  between  the  end  point  of  a  tracklet 
and  the  exit  of  the  flow;  the  diagonal  of  -4(i,...  ,n)x(n+i,-  ,2n)  stores  the  birth  probability  of  each  tracklet, 
which  is  computed  according  to  the  distance  between  the  start  point  of  a  tracklet  and  the  entry  of  the  flow. 
All  the  other  elements  in  A  arc  zero.  By  expanding  the  similarity  matrix,  we  impose  the  environmental 
information  in  the  tracklet  association  to  avoid  the  fragmented  tracks  in  the  middle  of  a  flow.  Note  that  the 
tracklet  association  is  performed  among  flows  that  have  been  stitched  in  the  motion  pattern  analysis  phase. 

The  video  shown  in  Figure  3.10,  contains  strong  parallax  and  a  convoy  of  vehicles  passing  through  a 
forest  where  long  term  occlusions  occur.  This  video  challenges  existing  motion  segmentation  and  tracking 
methods.  The  residual  pixels  that  do  not  belong  to  valid  motion  patterns  arc  shown  in  red  in  Figure  3. 10. 
Such  regions  caused  by  parallax,  which  sometimes  form  larger  regions  than  moving  objects,  cannot  be 
filtered  out  by  morphological  operations  or  the  motion  history  image  method.  By  flow  stitching,  the  long 
occlusion  is  correctly  handled  in  the  forest  video.  In  Figure  3. 10,  we  show  the  estimated  the  motion  field 
after  flow  segmentation  in  both  the  mosaic  space  and  the  image  space. 

5.  Online  Appearance  Modeling  with  Co-training  of  Hybrid  Discriminative  Generative  Track¬ 
ers 

Tag  and  track  problem  by  modeling  the  appearance  of  the  target  can  be  formulated  in  two  different  ways: 
generative  and  discriminative.  Generative  tracking  methods  learn  a  model  to  represent  the  appearance  of 
an  object.  Tracking  is  then  expressed  as  finding  the  most  similar  object  appearance  to  the  model.  Several 
examples  of  generative  tracking  algorithms  arc  Eigentracking  [24],  and  IVT  [91].  To  adapt  to  appearance 
changes,  the  object  model  is  often  updated  online,  as  in  [91].  Due  to  the  fact  that  the  appearance  variations 
are  highly  non-linear,  multiple  subspaces  [84]  and  non-linear  manifold  learning  methods  [53]  have  been 
proposed. 

Instead  of  building  a  model  to  describe  the  appearance  of  an  object,  discriminative  tracking  methods 
aim  to  find  a  decision  boundary  that  can  best  separate  the  object  from  the  background.  Recently,  many 
discriminative  trackers  arc  proposed  [10],  [33,  1 15]  and  demonstrate  strong  robustness  to  avoid  distracters 
in  the  background.  In  order  to  update  the  decision  boundary  according  to  new  samples  and  background, 
discriminative  tracking  methods  with  online  learning  are  proposed  in  [10,  33].  Oza  and  Russell  [116]  pro¬ 
posed  an  online  boosting  algorithm,  which  is  applied  to  the  visual  tracking  problem  [65,  94].  Due  to  the 
large  number  of  features,  either  an  offline  feature  selection  procedure  or  an  offline  trained  seed  classifier  is 
usually  required  in  practice.  Thus,  for  tracking  methods  based  on  online  boosting,  it  is  difficult  to  generalize 
to  arbitrary  object  types. 

We  propose  to  use  co-training  to  combine  generative  and  discriminative  models.  Here,  the  online  learn¬ 
ing  an  appearance  model  of  an  arbitrary  object  with  limited  labeled  data  is  treated  as  a  semi-supervised 
problem.  The  co-training  approach  proposed  by  Blum  and  Mitchell  [26]  is  a  principled  semi-supervise 
training  method.  The  basic  idea  is  to  train  two  classifiers  on  two  conditionally  independent  views  of  the 
same  data  (with  a  small  number  of  exemplars)  and  then  use  the  prediction  from  each  classifier  to  enlarge  the 
training  set  of  the  other.  It  is  proved  that  co-training  can  find  an  accurate  decision  boundary,  starting  from  a 
small  quantity  of  labeled  data  as  long  as  the  two  feature  sets  are  independent  [26]. 

We  formulate  the  visual  tracking  problem  as  a  state  estimate  problem.  Given  a  sequence  of  observed 
image  regions  Ot  =  (o\, ,  ot)  over  time  t.  the  goal  of  visual  tracking  is  to  estimate  the  hidden  state  ,sy.  In 
our  case,  the  hidden  state  refers  to  an  object’s  2D  position,  scale  and  rotation.  Assuming  a  Markovian  state 
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(a)  A  close  look  of  motion  field  with  ends  and  entry-to-sink  paths 


(b)  Snapshots  of  tracking  multiple  vehicles  in  the  flows.  Red  indicates  the  residual  pixels  that  do  not 
belong  to  valid  motion  patterns. 

Figure  3.10:  Tracking  with  strong  parallax 
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Figure  3.11:  Online  co-training  a  generative  tracker  and  a  discriminative  tracker  with  different  life  span  (the 
area  bounded  by  dashed  red  boxes  indicates  the  background) 


transition,  the  posterior  can  be  formulated  as  a  recursive  equation 


p{st\Ot)  oc  p{ot\st) 


p(st\st-i)p(st-i\Ot-i)dst-i 


(3.11) 


where  p(ot  \st)  andp(st  |st_i)  are  the  observation  model  and  state  transition  model  respectively.  p(st-i\Ot~i), 
which  is  represented  as  a  set  of  particles  and  weights,  is  the  posterior  distribution  given  all  the  observations 
up  to  time  t  —  1.  The  recursive  inference  in  Eq.3.1 1  is  implemented  with  resampling  and  importance  sam¬ 
pling  processes.  In  our  approach,  the  transition  of  the  hidden  state  is  assumed  to  be  a  Gaussian  distribution 
as,  p(zt\zt-i)  =  N{zt]  Zi_i,  T /, ) ,  where  T/  is  a  time  variant  diagonal  covariance  matrix.  In  this  recursive 
inference  formulation,  p(ot\st)  is  the  crucial  part  for  finding  the  ideal  posterior  distribution.  p(ot\st)  mea¬ 
sures  the  likelihood  of  observing  ot  given  one  state  of  the  object.  Besides  the  2D  position,  our  state  variables 
encode  an  object’s  rotation  and  scale.  This  reduces  the  appearance  variations  caused  by  such  motion  at  the 
price  of  that  more  particles  are  needed  to  represent  the  distribution. 

Our  measurement  of  one  observation  comes  from  two  independent  models.  One  is  the  generative  model, 
which  is  based  on  online  constructed  multi-subspaces.  The  other  is  the  discriminative  model,  which  is 
online  trained  with  HOG  features.  The  features  used  by  these  two  models,  namely  intensity  pattern  and 
local  gradient  features,  are  complementary.  After  limited  initialization,  these  two  models  are  co-trained 
with  sequential  unlabeled  data.  The  final  decision  is  made  by  the  combined  hybrid  model.  Due  to  the 
independence  between  the  two  observers,  our  observation  model  p(ot\st)  can  be  expressed  as  a  product 
of  two  likelihood  functions  from  the  generative  M.  model  and  the  discriminative  model  C,  p(ot\st )  oc 
PM{ot\st)pc(ot\st). 

Let  M.  =  {G| . ...,  represent  the  appearance  manifold  of  one  object  and  G/,  I  G  [1, ...,  L]  denote  the 
local  sub-manifold.  An  appearance  instance  x  is  a  d-dimension  image  vector.  Let  G;  =  (xi,  (// ,  A/ ,  n/ )  de¬ 
note  one  sub-manifold,  where  X;,  Uj.  A;  and  n;  represent  the  mean  vector,  eigenvectors,  eigenvalues  and  the 
size  (number  of  samples)  of  the  subspace  respectively.  Lor  simplicity,  we  omit  the  subscript  when  this  causes 
no  confusion.  Here,  A  =  diag(Ai, . . . ,  An)  with  sorted  eigenvalues  of  the  subspace,  Ai  ^  A2  •  •  •  ^  An.  A 
^-truncation  is  usually  used  to  truncate  the  subspaces,  namely  m  =  arg  min,-  (X]?:  /tr ( A)  ^  7/).  Lrom 
a  statistical  point  of  view,  a  subspace  with  m  eigenbases  can  be  regarded  as  a  m-dimensional  Gaus¬ 
sian  distribution.  Suppose  G  is  a  subspace  with  the  first  m  eigenvectors,  the  projection  of  x  on  G  is 
y  =  (7/1 , ... ,  ym)T  =  UT(x  —  x).  Then,  the  likelihood  of  x  can  be  expressed  [107]  as 


p(x|G) 


1=1 


(2  Gm/2  ii  \1/2 

i=  1 


exp 


2  p 


(2 np)(d  m)/2 


(3.12) 


where  e(x)  =  |x— UUTx\  is  the  projection  error,  namely  L2  distance  between  the  sample  x  and  its  projection 
on  the  subspace.  The  parameter  p  =  d  Yli=m+ 1  [107]  or  uses  the  j,  Am+ 1  as  a  rough  approximation. 
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By  using  Eq.3.12,  we  can  evaluate  the  confidence  of  a  sample  from  one  subspace.  As  our  generative  model 
contains  multiple  subspaces  (each  subspace  can  be  regarded  as  a  hyper-ellipsoid),  we  maintain  the  neigh¬ 
borhood  according  to  L2  distance  between  the  mean  vectors  of  subspaces.  To  evaluate  the  confidence  of  one 
sample  from  such  a  generative  model,  we  use  the  maximum  confidence  of  the  /v -nearest  (we  use  K  =  4  in 
experiments)  neighboring  subspaces. 

In  order  to  represent  a  large  number  of  sequential  samples,  we  use  a  fixed  number  subspaces:  if  the 
number  of  subspaces  exceeds  a  predetermined  maximum,  the  most  similar  two  subspaces  arc  merged.  In 
order  to  maintain  the  local  property  of  the  subspaces,  merging  only  happens  between  neighboring  subspaces. 
Merging  of  two  subspaces  and  measuring  the  similarity  between  two  subspaces  arc  two  critical  steps  in  this 
algorithm. 

Suppose  there  arc  two  subspaces  <1 1  =  (xi,  U\ ,  A] .  N)  and  Q2  =  (*2,  U2 ■  A2,  M),  which  we  arc  trying 
to  merge  to  a  new  subspace  Q  =  (x,  U,  A,  M +N).  If  the  dimension  of  fti  and  Q2  are  p  and  q,  the  dimension 
r  of  the  merged  subspace  ft  satisfies:  max(p,  q)^r^p  +  q  +  l.  The  vector  connecting  the  centers  of  the 
two  subspaces  does  not  necessarily  belong  to  either  subspace.  This  vector  causes  the  additional  one  in  the 
upper  bound  of  r. 

It  is  easy  to  verify  that  the  scatter  matrix  S  of  the  merged  subspace  ft  satisfies,  S  =  Si  +S2  +  ^f_^N  (xi  — 
x2)(xi  —  X2)7  .  We  aim  to  find  a  sufficient  orthogonal  spanning  of  S.  Let  h\  (x)  denote  the  residual  vector 
of  a  vector  x  on  fli,  hi(x)  =  x  —  UiUjx.  Note  that  h\(x)  is  orthogonal  to  U\,  i.e.  h(x)'U  =  0.  Now, 
U'  =  [Ui,  v]  is  a  set  of  orthogonal  bases  to  span  the  merged  space,  where  v  =  GS  (hi  (U2,  (X2  —  xi))) 
and  GS(- )  denote  the  Gram-Schmidt  process.  Given  the  sufficient  orthogonal  bases,  we  can  obtain  the  SVD 
decomposition  of  S. 
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where  G  =  U(  (X2  —  xi),  T  =  \TU2,  g  =  Uj (X2  —  xi)  and  7  =  U'(x 2  —  xi).  Now,  the  eigenvalue  of  the 
merged  subspace  is  A  in  Eq.3.13  and  the  eigenvector  U  is  simply  U'R.  Note  that  incrementally  updating  a 
subspace  with  one  observation  as  in  [84]  is  one  special  case  of  merging  two  subspaces  using  Eq.3.13. 

The  other  critical  step  is  to  determine  the  similarity  between  two  subspaces.  We  use  two  factors  to 
measure  the  similarity  between  two  neighboring  subspaces  Qj ,  D2,  the  canonical  angles  (principal  angles) 
and  the  data-compactness. 

Suppose  the  dimensions  of  two  subspaces  arc  p,q,  p  A  (/,  then  there  arc  q  canonical  angles  between 
the  two  subspaces.  A  numerical  stable  algorithm  [23]  computes  the  angles  between  all  pairs  of  orthonormal 
vectors  of  the  two  subspaces  as,  cos  9k  =  ctkiPi  U2),  k  =  1,  ■  ■  ■  ,  q ,  where  07. ( • )  is  the  klh  sorted  eigenvalue 
computed  by  SVD.  The  consistency  of  two  neighboring  subspaces  can  be  represented  as  follows. 


g 

Sim i(fi1,n2)=  H  cjk(uju2)  k  =  l,---  ,q  (3.14) 

k=q—do  +  l 

As  the  dimensionality  of  subspaces  is  larger  than  do,  the  initial  dimension,  we  select  the  do  largest  principal 
angles,  which  approximately  measure  the  angle  between  two  local  subspaces.  In  a  3D  space,  the  largest 
canonical  angle  between  two  2D  subspaces  is  equivalent  to  the  angle  between  the  two  planes.  In  this  case, 
we  prefer  to  merge  2D  patches  with  a  small  plane-to-plane  angle.  Note  that  the  merge  only  happens  between 
neighbor  subspaces.  The  neighborhood  is  defined  according  to  the  mean  vector  L2  distance.  Merging 
subspaces  with  a  small  principal  angle  can  avoid  destroying  the  local  structure  of  the  appearance  manifold. 

The  other  factor  to  consider  is  data-compactness,  which  measures  how  much  extra  dimensionality  is 
incurred  by  a  merge  operation.  Suppose  the  dimension  of  two  subspaces  fti,  (>2  is  p.  q.  p  A  q,  the  sorted 
eigenvalues  of  original  merged  subspace  arc  Ar  =  (Ai, . . . ,  A r),r  =  p  +  q  +  1.  The  similarity  based  on 
data-compactness  is  defined  as 

Sim2(^l i,D2)  =  Aj/V\  Xi  (3.15) 

Z - 'l=l  Z - Jl=l 
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If  Sirri2  is  close  to  one,  this  indicates  the  merge  operation  does  not  incur  any  new  dimension;  on  the  contrary, 
if  Sirn-i  is  small,  this  indicates  the  variations  in  Q]  and  Q-j  cannot  use  common  eigenvectors  to  represent  it. 
Combining  the  two  factors  in  Eq.3.14  and  Eq.3.15,  the  final  similarity  between  two  subspaces  is  defined  in 
Eq.3.16. 

Sim(fli,  Q2)  =  ^2)  +  WdSim,2(^i,  Vl 2 )  (3.16) 

where  wcj  is  the  weight  to  balance  these  two  factors.  We  use  Wd  =  0.2  in  experiments. 

For  the  discriminative  model,  we  adopt  an  incremental  SVM  algorithm,  LASVM  [27],  to  train  a  classifier 
between  object  and  background.  SVM  [179]  is  able  to  form  the  optimal  separating  function,  which  reduces 
to  a  lineal-  combination  of  kernels  on  the  training  data,  /(x)  =  ]T7  Q  ;  yy  A'(x  ;.  x)  +  b,  with  training  samples 
x;  and  corresponding  label  y,  =  ±1. 

In  practice,  this  is  achieved  by  maximizing  the  dual  objective  function  maxQ  W (a)  with 

W(a)  =  ^aiajK(xi,xj) 

i  i,j 


subject  to 


Oy  —  0,  Ai  Oli 


(3.17) 


where  A{  =  min(0,  Ciy ) .  Bi  max(0,  Cm).  Here,  a  is  a  vector  of  weights  on  //,; .  A  SVM  solver  can  be 
regarded  as  updating  a  along  some  direction  to  maximize  W (a).  Let  g  =  (gi, . . . ,  gn)  denote  the  gradient 
of  W(a) 


9k 


dW(a ) 
dak 


=  Vk~Y^  aiK(xii  xk)  =Vk-  y{xk )  +  b 


(3.18) 


% 


LASVM  suggests  that  optimization  is  faster  when  the  search  direction  mostly  contains  zero  coefficients. 
LASVM  uses  the  search  directions  whose  coefficients  are  all  zero  except  for  a  single  +1  and  a  single  -1.  The 
two  non-zero  coefficients,  are  called  r-violating  pair  (i.  j)  if  cq  <  Bi ,  oy  >  Aj,  and  gi  —  g}  >  r,  where  r  is 
a  small  positive  value,  and  LASVM  selects  the  r-violating  pair  (i.  j)  that  maximizes  the  directional  gradient 

9i  —  9j- 

We  compare  our  co-trained  tracker  with  two  generative  methods,  including  (GI)  IVT  [91]  and  our 
multiple  lineal'  subspaces  (G2)  algorithm  and  three  discriminative  methods,  including  online  selection  of 
discriminative  color  (Dl)  [33],  our  online  SVM  method  (D2)  and  ensemble  tracking  (E.T)  [10].  GI  uses  a 
single  15D  linear  subspace  and  updates  it  incrementally.  Note  that  Dl  does  not  consider  tracking  with  large 
scale  change  and  rotation.  GI,  G2,  D2  and  the  co-trained  tracker  use  the  same  parameters  in  CONDENSA¬ 
TION  algorithm,  but  GI,  G2  and  D2  use  self-learning  to  update  their  models.  We  compare  these  methods 
on  challenging  data  sets,  which  contain  image  sequences  of  various  types  of  objects.  Detailed  comparison 
can  be  found  in  Table  3.3  [193].  In  experiments,  we  frequently  find  that  the  co-trained  tracker  has  better 
self-awareness  of  current  tracking  performance  and  can  safely  enlarge  the  search  range  (by  changing  the 
diffusion  dynamics)  without  being  confused  by  distracters  in  the  background.  Also,  the  co-trained  tracker 
successfully  avoids  drifting  caused  by  varying  viewpoints  and  illumination  changes. 

Part  of  the  visual  results  are  shown  in  Figure  3.12. 


6.  Co-trained  Particle  Filter  Framework  for  Robust  Object  Tracking 

Even  though  our  proposed  co-trained  trackers  obtain  very  good  results,  the  speed  of  the  system  is  not  appli¬ 
cable  for  using  in  real-time  environment.  To  improve  it,  we  propose  a  new  co-training  framework  using  a 
cascade  particle  filter  to  label  incoming  data  continuously  and  online  update  hybrid  models  generatively  and 
discriminatively.  Each  of  the  layers  in  the  cascade  contains  one  or  more  either  generative  or  discriminative 
appearance  models.  The  cascade  manner  of  organizing  the  particle  filter  enables  the  efficient  evaluation  of 
multiple  appearance  models  with  different  computational  costs;  thus  improve  the  speed  of  the  tracker.  The 
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Frm 

No 

Occluded 

Frms/  Times 

Gt 

G2 

Dl 

D2 

Our  method 

FF 

TR/LR 

FF 

TR/LR 

FF 

TR/LR 

FF 

TR/LR 

FF 

TR/LR 

Seql 

761 

0/0 

17 

0/0 

261 

0/0 

n/a 

n/a 

491 

0/3 

749 

0/1 

Seq2 

313 

0/0 

75 

0/2 

282 

0/3 

9 

070 

214 

0/2 

295 

0/1 

Seq3 

184 

30/1 

50 

0/0 

50 

0/0 

25 

0/0 

50 

0/0 

154 

1/0 

Seq4 

338 

93/2 

33 

0/0 

70 

0/1 

33 

0/0 

72 

0/1 

230 

2/1 

Seq5 

140 

0/0 

11 

0/0 

15 

0/0 

6 

0/0 

89 

0/3 

140 

0/0 

Seq6 

945 

143/4 

163 

1/0 

506 

1/0 

382 

0/0 

54 

0/1 

798 

4/0 

Table  3.3:  Comparison  of  different  methods  G1:IVT  [91],  G2:  incremental  learning  multiple  subspaces, 
Dl:  online  selection  of  discriminative  color  features  [33],  D2:  online  SVM,  E.T:  ensemble  tracking  [10]. 
D1  uses  color  information,  which  is  not  available  for  Seql  and  Seq6. 


(a)  Tracking  and  reacquisition  with  abrupt  motion  and  blur 


(b)  Tracking  and  reacquisition  with  long  leaving  out  of  field  of  view 
Figure  3.12:  Tracking  various  type  of  objects  in  outdoor  environments 


proposed  online  framework  provides  temporally  local  tracking  that  adapts  to  appearance  changes.  Moreover, 
it  provides  an  object-specific  detection  ability  that  allows  to  reacquire  an  object  after  total  occlusion. 

We  formulate  the  co-training  in  the  last  stage  of  our  cascade  particle  filter.  This  layer  is  the  most 
important  layer  where  the  final  result  is  given  through  a  co-decision  process.  This  result  is  then  used  to 
update  all  of  the  other  models  in  other  layers  (as  shown  in  Figure  3.13).  Due  to  the  flexibility  of  our 
framework,  there  are  some  obvious  advantages. 

Fast  running  time:  As  mentioned  above,  when  sampling,  many  of  samples  have  low  confidence  score 
(close  to  zero)  because  they  are  not  related  to  the  object  of  interest.  To  improve  the  performance,  the 
cascade  particle  filter  is  applied  in  the  way  that  cheaper  models  will  be  put  in  the  very  first  layers  while 
more  expensive  models  are  arranged  in  the  latter  layers.  This  configuration  efficiently  boosts  the  speed  of 
the  whole  framework  to  achieve  real-time  performance. 

General  object  tracker:  Considering  tracking  as  a  semi-supervised  problem,  when  we  only  have  very 
limited  labeled  data  without  an  explicit  offline-trained  model  like  in  [90],  the  idea  is  to  build  an  appearance 
model  on-the-fly  to  adapt  with  all  the  changes  in  viewpoints,  lighting  conditions,...  during  tracking.  To 
fulfill  this  task,  we  adopt  co-training  as  the  framework  to  enhance  the  powers  of  different  models.  In  online 
training,  it  is  hard  for  one  model  to  train  on  its  own  because  the  early  mistakes  could  reinforce  themselves 
while  every  model  will  have  its  own  weakness.  The  best  solution  is  that  different  models  can  cooperate  with 
each  other  to  produce  a  strong  one  even  when  many  of  them  fail  at  one  point  of  time.  Co-training  is  an  ideal 
framework  to  address  this  issue. 

In  cascade  particle  filter,  our  co-training  is  set  up  in  the  last  stage  to  give  the  final  decision  in  the 
tracking  framework,  while  all  of  other  stages  determine  if  a  sample  is  good  to  go  through  the  next  layer  or 
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Figure  3.13:  Our  proposed  framework:  a  co-trained  cascade  particle  filter  (CCPF)  with  three  stages. 


not.  Different  from  the  proposed  framework  in  [90],  ours  naturally  extends  to  use  multiple  models  in  each 
stage  of  the  cascade  to  enhance  their  complementary  powers. 

Combination  of  different  models  There  often  lies  a  choice  between  which  direction  to  go  in  building 
a  model:  generative  or  discriminative.  The  goal  of  the  generative  model  is  focusing  on  describing  the 
appearance  of  the  object  while  the  discriminative  model  aims  to  find  a  way  to  separate  the  object  and  the 
background.  As  discussed  in  Section  II.  discriminative  models  usually  perform  better  than  generative  ones. 
It  is  natural  because  it  is  always  harder  to  “describe”  the  appearance  of  an  object  X  than  to  “tell”  what  is 
different  between  it  and  others.  However,  each  of  the  models  has  its  own  advantages  and  disadvantages. 
Our  framework  opens  an  easy  way  to  incorporate  both  of  them  and  strengthen  their  powers  logically. 

Moreover,  in  practice,  using  one  type  of  features  may  not  be  enough  to  produce  a  robust  tracker.  For 
example,  to  track  a  pedestrian,  not  only  the  curves  around  the  body  may  be  of  interest,  but  also  the  colors  of 
the  clothes  and  the  textures  on  the  whole  pedestrian  body  are  very  important.  To  improve  performance,  we 
propose  to  use  different  types  of  features:  color,  local  patch,  and  object  template. 

Strong  reacquisition:  It  is  worth  to  emphasize  that  the  outcome  of  our  framework  is  not  only  the 
tracking  result  but  a  detector  for  that  specific  object  as  well,  which  is  extremely  useful  for  reacquisition.  For 
instance,  when  the  object  leaves  the  field  of  view  and  comes  back,  without  any  knowledge  about  exit/entry 
points,  the  only  way  is  doing  exhaustive  search  over  the  frame  to  find  our  object.  With  the  learned  model, 
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our  tracker  can  find  the  object  easily.  Total  occlusion  is  also  considered  as  the  same  case. 

Experiment  settings:  The  framework  can  run  by  taking  some  initial  limited  data  as  input.  Ideally, 
the  object  of  interest  is  tagged  from  the  5-10  very  first  frames.  Also,  a  simple  template  matching  method 
is  implemented  to  help  initializing  the  tracker  when  the  object  is  provided  only  in  the  first  frame.  We 
adopt  conservative  evaluation  criteria  in  decision  making  for  each  model.  The  framework  is  implemented 
as  follows.  Given  the  labeled  data  in  the  first  few  frames,  we  initialize  all  of  the  models  independently. 
When  tracking  starts,  all  particles  arc  sequentially  passed  through  the  cascade  particle  filter  which  consists 
of  multiple  stages.  At  each  stage,  the  models  simultaneously  evaluate  all  the  samples  and  co-decide  to  filter 
out  a  number  of  negative  samples  then  propagate  the  rest  to  the  next  stage.  At  the  final  stage,  a  co-training 
process  takes  place  to  help  each  model  train  each  other.  In  our  implementation,  we  adopt  one  generative  and 
one  discriminative  model  in  this  stage. 

Comparative  analysis  Co-trained  tracker  vs.  others:  We  compare  our  co-trained  tracker  with  two  gen¬ 
erative  methods,  which  arc  Incremental  Visual  Tracking  (IVT)  [139],  fragment-based  tracking  (FT)  [2],  and 
three  discriminative  methods,  including  online  selection  of  discriminative  color  feature  (OSDC)  [33],  en¬ 
semble  tracking  (ET)  [10],  and  multiple  instance  learning  tracking  (MIL)  [11],  In  IVT,  we  use  a  single  15D 
lineal-  subspace  and  update  it  incrementally. 


Video  Sequence 

Frames 

Occlusion 

IVT 

OSDC 

ET 

FT 

MIL 

Ours 

Dark 

761 

0 

17 

n/a 

94 

7 

230 

759 

Jumping 

313 

0 

75 

313 

44 

263 

313 

313 

Handheld 

140 

0 

11 

6 

22 

46 

86 

140 

Vehicle 

945 

143 

163 

n/a 

10 

547 

202 

802 

UAVpersonl 

184 

30 

50 

50 

53 

52 

154 

154 

UAVperson2 

338 

93 

33 

8 

118 

92 

32 

240 

Table  3.4:  Comparison  of  different  methods  IVT:  incremental  visual  tracking  [139],  OSDC:  online  selection 
of  discriminative  color  features  [33],  D2:  online  SVM,  ET:  ensemble  tracking  [10],  FT:  robust  fragments- 
based  tracking  using  the  integral  histogram  [2],  MIL:  Visual  Tracking  with  Online  Multiple  Instance  Learn¬ 
ing  [11]  and  our  co-trained  tracker.  D1  uses  color  information,  which  is  not  available  for  sequences  “Dark” 
and  “Vehicle”. 

The  comparison  demonstrates  that  the  co-trained  tracker  performs  more  robustly  than  other  methods. 
Note  that  OSDC  requires  color  information,  thus  it  cannot  process  some  sequences,  which  are  indicated  as 
“n/a”.  The  visual  results  are  shown  in  Figure  3. 14,  where  the  tracked  objects  are  bounded  with  green  boxes. 
The  red  box  indicates  none  of  the  models  is  updated  in  this  frame.  In  experiments,  we  frequently  find  that 
the  co-trained  tracker  has  better  self-awareness  of  current  tracking  performance  and  can  safely  enlarge  the 
search  range  (by  changing  the  diffusion  dynamics)  without  being  confused  by  distracters  in  the  background. 
Also,  it  can  successfully  avoid  drifting  caused  by  varying  viewpoints  and  illumination  changes. 


Tracker 

Average  Center  Location  Error  (pixels) 

Running  Time  (fps) 

One -layer 

4.38 

4.25 

Two-layer 

4.93 

9.34 

Three-layer 

5.74 

14.79 

Table  3.5:  Comparison  between  cascade  setup  and  non-cascade  setup  in  terms  of  precision  and  running 
time. 

Cascade  vs.  non-cascade:  To  demonstrate  the  efficiency  of  using  cascade  particle  filter,  we  set  up  three 
trackers  which  have  one  layer,  two  layers,  and  three  layers,  respectively.  The  one-layer  tracker  indeed  uses 
our  proposed  co-trained  models.  The  two-layer  tracker  contains  the  co-trained  tracker  and.  After  the  first 
layer,  half  of  the  samples  are  kept;  while  after  the  second  layer,  25%  of  the  samples  are  preserved.  Therefore, 


60 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


(aj  Fragment-based  Tracking  (FT)  [2] 


(b)  Multiple  instance  learning  tracking  (MIL)  [11] 


(c)  Our  co-trained  tracker 


Figure  3.14:  Some  visual  comparative  results  between  our  co-trained  tracker  and  other  trackers  in  several 
sequences  ‘Handheld”,  “UAVperson2”,  and  ‘Vehicle”  (from  left  to  right). 


with  the  sampling  of  600  particles,  300  remaining  particles  are  evaluated  by  the  final  stage  of  the  two-layer 
tracker;  whereas  75  best  particles  are  evaluated  in  the  last  stage  of  the  three-layer  tracker. 

The  performance  of  these  trackers  is  compared  based  on  the  average  center  location  error  and  the  running 
time,  which  is  shown  in  Table  3.5.  The  six  sequences  used  in  the  previous  experiment  are  adopted.  The 
results  clearly  show  that  cascade  particle  filter  improves  the  running  time  performance  by  a  large  margin, 
while  producing  comparable  robust  results. 

Reacquisition  performance:  One  of  the  main  advantages  of  our  tracker  is  the  reacquisition  ability 
which  is  not  well  explored  by  other  state-of-the-art  algorithms.  Our  method  not  only  ensures  the  quality 
of  long-term  tracking  results  but  also  provides  the  capability  to  re-detect  an  observed  object  after  it  leaves 
the  field  of  view.  To  evaluate  this  reacquisition  ability,  we  synthesize  the  leaving  out  of  the  field  of  view 
behavior  of  an  object.  We  create  a  new  sequence  after  deleting  several  frames  from  the  original  one.  We 
start  from  the  beginning  of  the  sequence  (after  ignoring  some  initial  frames  for  the  learning  phase),  for  each 
step  of  50  frames,  we  replace  30  continuous  frames  by  a  synthetic  background  image.  Note  that  each  step 
forms  a  new  sequence.  109  new  sequences  are  thus  created.  Some  examples  of  the  new  data  sets  are  shown 
in  Figure  3.15. 
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To  evaluate  the  reacquisition  ability,  we  count  the  number  of  tracked  (reacquired)  cases,  the  number  of 
missed  cases,  and  the  number  of  false  alarm  cases.  If  the  tracker  can  reacquire  the  object  within  at  most 
10  frames  after  the  object  re-appears,  and  the  overlapped  region  between  the  tracked  and  the  ground-truth 
windows  is  larger  than  50%  of  the  ground-truth  window,  it  is  counted  as  tracked  case.  The  details  of  the 
results  are  shown  in  Table  3.6. 


Video  Sequence 

No  of  cases 

No.  of  tracked  cases 

No  of  missed  cases 

Number  of  false  alarms 

Clutter 

29 

29 

0 

0 

Handheld 

3 

3 

0 

0 

Jumping 

5 

5 

0 

0 

Scale 

37 

24 

9 

4 

Sylvester 

26 

20 

4 

2 

UAVpersonl 

4 

4 

0 

0 

UAVperson2 

5 

5 

0 

0 

Total 

109 

90 

13 

6 

Table  3.6:  Reacquisition  performance  of  our  tracker. 

It  can  be  observed  that  most  of  our  missed  cases  come  from  the  two  sequences  “Scale”  and  “Sylvester”. 
This  is  because  they  contain  extremely  difficult  cases  where  the  object  appearance  changes  differently  in 
terms  of  scale,  pose,  lighting  while  the  synthesized  leaving-out-of-field  cases  happen. 


Figure  3.15:  Synthesized  data  sets  for  testing  reacquisition.  From  left  to  right:  before  leaving-field-of-view, 
synthesized  background,  reappeared  object. 


7.  Partial  Occlusion  Handling  in  Object  Tracking 

In  our  previous  proposed  methods,  we  did  not  address  the  occlusion  issue  explicitly  but  use  a  threshold 
to  control  the  update  process  so  that  the  appearance  models  are  only  updated  when  there  is  no  occlusion 
and  vice  versa.  However,  to  define  such  a  threshold  is  not  straightforward.  There  is  a  trade-off  between  a 
“conservative”  update  or  an  “easygoing”  update  in  order  to  adapt  to  the  appearance  change.  To  address  this 
issue,  we  propose  improve  our  current  approach  by  proposing  to  use  a  co-training  framework  of  generative 
and  discriminative  trackers  to  detect  occluding  region  and  continuously  update  both  models  using  the  infor¬ 
mation  from  the  non-occluded  part.  The  generative  model  encodes  all  of  the  appearance  variations  using  a 
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(a)  Occlusion  (b)  Discriminative  (c)  Generative  (d)  KLT  (e)  Movement 

Figure  3.16:  Partial  occlusion  observations  on  our  trackers  and  KLT  feature  movement 


(b)  Generative  tracker  model  update  (c)  Discriminative  tracker  model  update 


Figure  3.17:  Occlusion  recovery  from  our  trackers  (image  is  scaled  to  32x32  for  training) 


low  dimension  subspace,  which  helps  provide  a  strong  reacquisition  ability.  Meanwhile,  the  discriminative 
classifier,  an  online  support  vector  machine,  focuses  on  separating  the  object  from  the  background  using  a 
Histograms  of  Oriented  Gradients  (HOG)  feature  set.  To  detect  occlusion,  a  likelihood  map  is  generated  by 
the  two  trackers  through  a  co-decision  process.  To  handle  the  cases  when  there  is  disagreement  between 
these  two  trackers,  the  movement  vote  of  KLT  local  features  [145]  is  used  as  a  referee.  Finally,  each  tracker 
recovers  the  occluded  region  and  updates  the  models  using  the  new  non-occluded  information. 

To  detect  the  occlusion,  each  model  estimates  the  occlusion  likelihood  of  each  block  in  a  sample  and 
makes  the  decision  together.  The  KLT  features  are  also  generated  and  tracked  in  order  to  determine  when 
occlusion  happen  through  a  movement  voting  process.  Some  observations  obtained  from  generative  model, 
discriminative  model,  and  KLT  tracker  are  shown  in  Figure  3.16  when  occlusion  appears. 

7.1.  Generative  Tracker 

We  propose  to  use  a  single  linear  subspace  to  approximate  the  appearance  model  of  the  object.  This  is 
close  to  the  incremental  visual  tracker  (IVT)  [139],  but  with  partial  occlusion  handling.  To  detect  partial 
occlusion,  as  discussed  in  Section  2,  the  projection  error  is  split  into  blocks  as  done  in  the  discriminative 
tracker.  We  simply  compute  the  occlusion  likelihood  by  using  the  projection  error  over  each  block.  The 
projection  error  at  block  ith  is: 


£i(xi)  =  |xj  -  UUTXi\  (3.19) 

These  likelihood  values  are  then  used  to  generate  a  binary  occlusion  likelihood  map.  The  0  value  corre¬ 
sponds  to  the  block  having  score  lower  than  50%  of  the  maximum  score  block,  and  1,  otherwise. 
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7.2.  Discriminative  Tracker 

We  also  compute  the  classifier  score  on  each  block  instead  of  the  whole  sample  patch  to  infer  where  the 
partial  occlusion  occurs.  The  decision  function  of  conventional  SVM  is 

f{x)  = /3 +  ^TakK(x,xk),  (3.20) 

k= l 

where  :  k  €  {1, 2, ...,  nsv}  arc  the  support  vectors.  K(x,Xk)  is  the  kernel  function;  and  f3  is  the  bias 
constant.  Here,  a  linear  kernel  is  used,  which  means  K(x,  xj.)  is  the  inner  scalar  product  of  two  vectors  in 
W .  Now  we  have  to  distribute  the  bias  constant  /3  to  each  block  /i, ;  so  that  the  contribution  score  of  each 
block  in  the  final  classifier  confidence  score  can  be  computed  after  subtracting  that  local  bias  from  the 
total  feature  inner  production  over  that  block. 

We  can  have  the  distribution  of  bias  constant  on  each  block: 

(N+  N~  \ 

AT.Bi:i  +  'EBr,i)  <321> 

P= 1  9=1  / 

The  occlusion  likelihood  map  is  generated  as  a  binary  image  based  on  the  block  score,  which  is  0 
for  negative  and  1,  otherwise.  Each  pixel  in  the  likelihood  image  corresponds  to  a  block  in  the  sample.  To 
recover  from  the  occlusion,  the  non-occluded  part  is  kept  while  the  occluded  area  is  inferred  from  a  previous 
frame  (as  shown  in  Figure  3. 17(b)).  In  a  long-term  partial  occlusion,  we  can  consider  this  step  as  a  recursive 
process  where  the  occluded  area  of  the  object  in  the  current  frame  is  projected  from  that  of  the  object  in  the 
previous  frame,  which  may  also  be  drawn  from  its  previous  one. 

7.3.  Local  Features  Movement  Voting  Using  KLT 

Taking  advantage  of  the  simplicity  and  fast  computation  of  KLT  features  [145],  tracking  consistency  is 
checked  based  on  the  movement  of  these  features  in  the  object  region  at  every  frame.  Due  to  the  discontinuity 
between  non-occluded  and  occluding  regions,  some  KLT  features  arc  driven  in  the  same  direction  and 
velocity  which  are  different  from  the  remaining  paid.  Taking  account  this  observation,  we  propose  a  voting 
scheme  on  the  movement  of  these  local  features  to  detect  occlusion. 

After  being  detected  in  the  first  frame,  these  features  arc  tracked  in  every  frame.  After  removing  all  of 
the  outliers,  the  magnitude  displacement  of  each  feature  is  then  normalized  to  [0,1]  and  encoded  in  a  4-bin 
histogram.  The  direction  of  the  movement  is  encoded  in  a  8-bin  histogram,  each  of  which  covers  a  |  span. 
All  displacement  vectors,  thus,  arc  accumulated  into  a  4x8  2D  histogram. 

When  there  is  a  majority  of  KLT  features  in  a  region  having  different  movement  behavior  than  the  rest, 
partial  occlusion  is  detected.  In  practice,  we  choose  0  =  0.7.  It  is  important  to  note  that  the  KLT  features 
arc  re-initialized  after  occlusion  and  this  step  is  only  applied  as  a  referee  when  there  is  disagreement  on 
occlusion  detection  between  the  two  generative  and  discriminative  trackers. 

7.4.  Experiments 

In  the  first  frame,  we  manually  select  the  object  and  apply  simple  template  matching  for  the  next  4  frames. 
These  initial  labeled  data  arc  then  transferred  to  both  generative  and  discriminative  trackers  for  training. 
Our  Bayesian  framework  generates  600  particles  at  each  frame.  The  combined  tracker  is  implemented  in 
C++  and  runs  at  4fps  on  an  Intel  QuadCore  3.0GHz  system.  We  tested  our  algorithm  on  several  challenging 
published  video  sequences  of  different  types  of  objects  in  indoor  and  outdoor  environments.  Several  related 
state-of-the-art  trackers  included  in  the  comparison  arc  the  Co-Tracker  [193],  which  is  the  most  related 
to  our  tracker,  the  Frag-Tracker  [2],  the  Online  and  Semi-Boosting  Tracker  (OAB,  SB)  [62,  64],  the  P-N 
Tracker  (PNT)  [75],  the  MTI Tracker  [1 1]  and  its  new  variation  with  no  regret  MIO  Tracker  [89].  We  use  the 
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(a)  Occluded  Face  2 


(b)  Person 


(c)  Tiger  2 


---MILTracker  Frag-Tracker  - OurTracker 


Figure  3.18:  Some  screen  shots  from  the  testing  results.  Because  of  clarity  issue,  we  only  choose  Frag- 
Tracker[2],  MILTracker[ll]  to  show  some  results  comparing  with  our  tracker. 

provided  results  and  published  source  code  from  the  authors12,3  .  To  prove  the  precision  of  our  tracker,  we 
used  the  same  measurement,  average  center  location  errors  (in  pixels),  used  for  evaluation  in  [1 1,  89].  All  of 
the  testing  sequences  provide  long-term  and  heavy  partial  and  total  occlusions,  and  challenging  appearance 
changes  such  as  illumination  changes,  abrupt  motion,  rotation,  and  cluttered  backgrounds. 


Video  Sequence 

Frames 

GT 

DT 

FT 

OAB 

ST 

PNT 

MILT 

MIO 

CoT 

Ours 

Coke  Can 

292 

102 

9 

67 

25 

85 

8 

21 

22 

10 

8 

Occluded  Face  1 

900 

86 

17 

7 

44 

41 

8 

27 

14 

16 

5 

Occluded  Face  2 

808 

14 

12 

21 

21 

43 

8 

20 

13 

12 

7 

Person 

200 

35 

73 

44 

37 

154 

44 

34 

n/a 

33 

5 

Tiger  1 

354 

52 

6 

40 

35 

46 

13 

15 

24 

5 

4 

Tiger  2 

365 

43 

7 

37 

34 

53 

21 

17 

23 

7 

5 

Table  3.7:  Average  center  location  errors.  (GT:  Generative  Tracker,  DT:  Discriminative  Tracker,  FT:  Frag- 
Tracker  [2],  OAB:  Online  Boosting  Tracker  [62],  ST:  Semi-Boosting  Tracker  [64],  PNT:  P-N  Tracker  [75], 
MILT:  MILTracker  [1 1],  MIO:  MIL  No  Regret  Tracker  [89],  CoT:  Co-Tracker  [193]  )  in  different  challeng¬ 
ing  datasets.  The  best  performance  is  in  bold,  the  second  best  is  in  italic. 


8.  Context  Tracker:  Exploring  Supporters  and  Distracters  in  Unconstrained  Environments 

A  major  research  axis  has  been  focused  on  building  a  strong  model  to  encode  the  variations  of  object  appear¬ 
ance  while  distinguishing  it  from  the  background.  By  doing  this,  a  fundamental  dilemma  occurs:  the  more 
complex  the  appearance  model,  the  more  expensive  it  is.  At  the  extreme,  the  emergence  of  cluttered  back- 

1  MILTracker:  http :  /  /vision  .  ucsd .  edu/  ~babenko/pro  ject_miltrack  .  shtml 
2Frag-Tracker:  http :  / /www .  cs  .  technion .  ac  .  il/  ~amita/f  ragtrack/ f  ragtrack  .  htm 
3 Semi-boosting  Tracker:  http  :  /  /www .  vision .  ee  .  ethz  .  ch/boostingTrackers/ index .  htm 
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ground  and  the  occurrence  of  regions  having  similar  appearance  as  the  target  makes  appearance  modeling  is 
very  challenging. 

In  fact,  there  is  additional  information  which  can  be  exploited  instead  of  using  only  the  object  region. 
Context  information  has  been  applied  actively  in  object  detection  [45],  object  classification  [88,  1 14],  object 
recognition  [138].  It  has  been  employed  recently  in  several  tracking  methods  [63,  191].  One  reason  for 
context  being  overlooked  is  the  fast  run-time  requirement.  Also,  visual  tracking,  especially  single  object 
tracking,  is  considered  as  a  semi-supervised  problem  where  the  only  known  data  is  the  object  bounding 
box  in  the  first  frame  (or  in  first  few  frames),  which  means  learning  such  a  context  needs  to  be  performed 
on-the-fly. 

Here,  we  propose  to  exploit  the  context  information  by  expressing  it  in  two  different  terms:  1)  Dis¬ 
tract  ers  are  regions  that  have  similar  appearance  as  the  target,  2)  Supporters  are  local  key-points  around  the 
object  having  motion  correlation  with  our  target  in  a  short  time  span.  Supporters  occur  in  regions  belonging 
to  the  same  object  as  the  target,  but  are  not  included  in  the  initial  bounding  box.  In  other  words,  the  goal 
of  our  algorithm  is  to  find  all  possible  regions  which  look  similar  to  the  target  to  prevent  drift,  and  to  look 
for  useful  information  around  the  target  to  have  strong  verification.  The  target  and  distracters  are  detected 
using  shared  sequential  randomized  ferns  [117].  They  are  represented  by  individual  evolving  templates. 
The  supporters,  on  the  other  hand,  are  represented  as  keypoints,  and  described  using  descriptors  of  the  local 
region  around  them. 

8.1,  Distracters 


(a)  Initialization  (b)  Exploit  all  other  distracters  (c)  Tracking  the  distracters  (d)  Reacquire  the  target  with- 

even  when  the  object  leaves  out  confusion  with  other  dis- 
FoV  traders 

Figure  3.19:  Automatically  exploiting  distracters.  Target  is  in  green,  distracters  are  in  yellow. 

Distracters  are  regions  which  have  appearance  similar  appearance  to  the  target  and  consistently  co¬ 
occur  with  it.  Usually,  distracters  are  other  moving  objects  sharing  the  same  object  category  as  our  target 
(Figure  3.19).  To  build  an  appearance  model  to  distinguish  objects  of  the  same  type  is  equivalent  to  develop 
a  recognition  system  which  needs  a  large  amount  of  supervised  samples  to  train.  However,  in  visual  tracking, 
the  tracker  has  temporal  and  spatial  information  help  to  exploit  which  region  is  “dangerous”  to  preclude. 
To  prevent  our  tracker  from  drifting  to  these  regions,  we  propose  to  detect  and  initiate  a  simple  tracker  for 
each  of  them  so  that  we  can  minimize  confusion  during  tracking. 

Due  to  the  efficiency  of  randomized  ferns  classifier,  which  is  widely  used  in  recognition  [28,  118],  and 
tracking  [75],  we  employ  it  to  detect  possible  distracters  in  every  frame.  Randomized  ferns  were  originally 
proposed  by  Ozuysal  et  al.  [117]  to  increase  the  speed  of  randomized  forest  [29].  Unlike  tree- structure  in 
randomized  forest,  ferns,  having  non-hier  archie  al  structures,  consist  of  a  number  of  binary  testing  functions. 
In  our  case,  each  of  them  corresponds  to  a  set  of  Binary  Pattern  features.  Each  leaf  in  a  fern  records 
the  number  of  added  positive  and  negative  samples  during  training.  For  a  test  sample,  its  evaluation  by 
calculating  the  binary  pattern  features  leads  it  to  a  leaf  in  the  fern.  After  that,  the  posterior  probability  for 
that  input  testing  sample  in  feature  vector  Xi  to  be  labeled  as  an  object  (y  =  1)  by  a  fern  j  is  computed  as 
Pry  (y  =  1 1  )  =  p/(p  +  n),  where  p  and  n  arc  the  number  of  positive  and  negative  samples  recorded  by 

that  leaf.  The  posterior  probability  is  set  to  zero  if  there  is  no  record  in  that  leaf.  The  final  probability  is 
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calculated  by  averaging  the  posterior  probabilities  given  by  all  ferns: 

T 

Pr(y  =  1|  Xi)  =  ^ ~2Prj(y  =  1|  Xi)  (3.22) 

1 

where  T  is  the  number  of  ferns.  To  improve  the  running  time,  these  randomized  ferns  arc  shared  between 
our  object  detector  and  distracter  detector.  Each  tracker  controls  the  posterior  probability  by  adding  its 
positive  and  negative  samples  to  the  ferns  according  to  the  P -constraints  and  N- constraints ,  respectively  as 
in  [75].  The  P -constraints  force  all  samples  close  to  the  validated  trajectory  to  have  positive  label,  while 
N- constraints  have  all  patches  far  from  the  validated  trajectory  labeled  as  negative.  Different  from  [75], 
we  avoid  adding  hard  negative  samples  to  avoid  over-fitting.  Also,  during  tracking,  when  the  appearance 
of  a  distracter  is  different  from  our  target,  we  discard  it.  Indeed,  it  helps  to  emphasize  that  our  focus  is  on 
tracking  a  single  target,  not  on  multiple  target  tracking.  This  clearly  explains  the  intuition:  when  several 
objects  have  similar  appearance  to  our  object,  the  target  tracker  pays  attention  to  them;  if  these  distracters 
change  their  appearance  and  no  longer  look  like  our  object,  they  can  be  ignored. 

Therefore,  a  sample  is  considered  a  distracter  candidate  if  it  passes  the  random  ferns  with  a  probability 
Pr(y  =  1 1 Xi)  >  0.5,  and  is  not  the  target.  How  to  determine  a  candidate  as  our  target  is  discussed  in  Section 
4.  We  maintain  an  M  frames  sliding  window  and  count  the  frequency  fdk  of  a  candidate  k  based  on  its 
appearance  consistency  spatial  consistency  related  to  the  target.  Then  a  candidate  is  classified  as  a  distracter 
as  follows 

(1  if  fdk  >  0.5 

Pd(Ud  =  1| Xi)  =  <  and  d(xi,M)  >  0.8  (3.23) 

[  0  otherwise 

where  Pd{yd  =  1|a:)  is  the  probability  for  a  candidate  i  in  a  feature  vector  .x,  having  label  y,i,  while 


(a)  Detecting  all  supporters  (b)  Learning  the  active  sup-  (c)  Avoid  drifting  to  other  re-  (d)  Reacquire  the  target  with 
porters  gions  while  target  is  under  oc-  strong  supporter  model 

elusion 


Figure  3.20:  Learning  supporters.  Active  supporters  are  pink  dots,  passive  supporters  are  in  blue  dots,  object 
center  is  black  dot. 

d(xi,  M)  is  the  confidence  of  this  candidate  evaluated  by  the  template-based  model  of  the  target.  The  first 
condition  allows  to  detect  distracters  which  repeatedly  co-occur  with  our  target,  while  the  second  one  helps 
to  exploit  distracters  having  very  similar  appearance  to  our  target. 

8.2.  Supporters 

We  aim  to  build  an  efficient  supporters  set  which  helps  to  quickly  verify  the  location  of  the  target.  Support¬ 
ers  are  features  which  consistently  occur  around  the  object  as  shown  in  Figure  3.20.  They  also  have  a  strong 
correlation  in  motion  with  our  target.  It  is  worth  noting  that  our  goal  is  tracking  in  unconstrained  environ¬ 
ment  with  several  challenges  such  as  frame-cuts,  abrupt  motion  due  to  hand-held  camera  recording.  It  limits 
us  from  using  some  motion  model  to  predict  the  location  of  a  target  based  on  the  motion  of  the  supporters  as 
in  [63]  or  of  the  auxiliary  objects  as  in  [191].  We  also  would  like  to  emphasize  that  our  candidate  responses 
are  obtained  based  on  detection.  The  supporters  are  also  detected  from  the  local  region  around  each  can¬ 
didate.  After  that,  these  supporter  detection  responses  arc  matched  with  the  ones  from  previous  frames  to 
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find  the  co-occurrence  between  them  and  our  target.  In  fact,  from  these  results,  the  motion  correlations  arc 
also  inferred  without  using  very  complex  motion  models  needed  in  unconstrained  environments.  Moreover, 
unlike  the  supporters  proposed  in  [63]  which  arc  expensive  to  detect  and  match  in  the  whole  frame,  our 
supporters  arc  efficiently  detected  and  matched  around  the  locations  of  the  very  few  candidates  having  high 
probability  to  be  the  target  in  each  frame. 

To  detect  supporters,  we  use  the  Fast  Hessian  Detector  and  employ  SURF  descriptor  as  in  [18]  to  de¬ 
scribe  the  region  around  them.  We  store  all  of  these  supporters  in  a  sliding  window  of  k  frames  (k  =  5 
in  our  implementation).  There  arc  two  types  of  supporters:  active  and  passive.  The  active  supporters  arc 
the  ones  co-occurring  with  our  target  in  high  frequency  fs  >  0.5  within  the  sliding  window,  while  passive 
ones  arc  the  rest.  When  there  arc  regions  having  similar  appearance  to  our  target  but  not  being  tracked  by 
distracter  trackers,  all  of  SURF  features  are  detected  around  these  regions.  After  that,  they  arc  matched 
to  our  supporter  model,  which  basically  are  the  latest  descriptors  of  the  supporters  in  the  sliding  window. 
Finally,  the  supporting  score  is  computed  as  follows 

Si  =  —  (3.24) 

nta 

where  nam.  and  nta  are  the  numbers  of  active  matched  supporters  and  total  active  supporters  in  the  model. 
A  supporter  model  is  considered  strong  i f  .S',;  >  0.5  and  n/„.  >  5  (to  avoid  the  unstable  information  within 
non-textured  regions  around  our  target).  Then  all  of  the  matched  results  arc  used  to  update  the  supporter 
model.  Note  that  the  unmatched  results  arc  also  added  to  the  model. 

8.3.  Context  Tracker 

We  use  the  P-N  Tracker  [75]  as  our  basic  target  tracker  with  several  extensions.  First,  we  extend  the  ran¬ 
domized  ferns  to  accept  multiple  objects;  in  fact,  it  is  not  equivalent  to  a  multi-class  classifier  because  each 
object  preserves  its  own  posterior  probability  while  they  may  share  the  same  object  type  as  our  target.  Sec¬ 
ond,  we  applied  our  new  6bitBP  which  helps  to  boost  up  the  speed  of  the  detector.  Our  6bitBP  makes  use 
of  the  constant  value  of  each  whole  patch  during  evaluation.  Third,  instead  of  using  only  the  first  initial 
patch  as  the  object  model,  which  is  quite  conservative  and  vulnerable  to  appearance  changes,  we  use  the 
online  template-based  object  model  as  in  [74].  However,  we  improve  this  model  by  constructing  it  in  binary 
search  tree  using  k-means.  The  model  is  iteratively  split  into  two  subsets  to  form  a  binary  tree.  By  doing 
this,  the  computational  complexity  to  evaluate  a  sample  is  0(logn)  instead  of  0(n)  when  using  Brute-force. 
This  improvement  is  important  in  improving  the  running  time  because  the  online  model  linearly  grows  to 
adapt  to  appearance  changes.  It  is  worth  noting  that  other  tracking  methods  can  also  be  extended  using  our 
concepts.  However,  we  choose  the  PN-Tracker  because  it  uses  scanning  window  to  search  for  all  of  possible 
candidates  in  the  whole  image  which  helps  to  explore  the  context  at  the  same  time.  Also,  the  randomized 
forest  is  extendable  to  reduce  the  cost  of  initializing  a  totally  new  tracker  for  a  distracter. 

As  discussed,  distracters  are  regions  which  have  similar  appearance  as  our  target.  In  our  tracker,  a  testing 
sample  confidence  score  is  computed  using  Normalized  Cross-Correlation  (NCC)  between  it  and  the  closest 
image  patch  in  the  object  model.  The  region  having  the  highest  confidence  is  considered  as  the  current  target 
if  its  score  is  larger  than  a  threshold  6  =  80%.  However,  in  practice,  there  arc  several  other  regions  satisfying 
this  condition.  After  we  choose  the  best  candidate  as  the  tracking  result,  all  of  other  responses  arc  associated 
to  the  distracter  trackers  using  greedy  association:  the  tracker  producing  higher  confidence  on  a  patch  is 
associated  with  higher  priority.  The  remaining  regions  trigger  new  distracter  trackers.  These  trackers  arc 
formulated  similarly  to  our  basic  tracker.  However,  to  avoid  the  increasing  number  of  unnecessary  trackers, 
they  are  terminated  whenever  they  lose  their  target. 

Assuming  that  we  have  the  valid  target  at  frame  t,  the  supporters  arc  extracted  around  the  location  of 
that  target  with  a  radius  R.  After  that,  a  sliding  window  of  /;:  =  5  frames  is  used  to  store  and  match  the 
previous  supporters  with  the  current  ones.  Each  match  makes  the  frequency  of  that  supporter  increase  by  1. 

In  practice,  there  are  several  candidates  similar  to  our  target  with  very  high  confidence  score.  In  fact,  the 
right  candidate  may  not  even  obtain  the  highest  score,  especially  when  the  appearance  is  changing  (as  shown 
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(a)  Start  changing  appearance  (b)  Exceeding  the  threshold, 

drifting. 


Figure  3.21:  Drifting  to  another  object  with  the  highest  score. 


(a)  PNTracker  [75]  on  MultipleFaces  sequence 


(b)  Our  context  tracker  on  MultipleFaces  sequence 


(c)  PNTracker  [75]  on  Babies  sequence  (d)  Our  context  tracker  on  Babies  sequence 

Figure  3.22:  Comparison  between  PNTracker  [75]  and  our  context  tracker  on  challenging  sequences 


in  Figure  3.21).  Without  context,  the  tracker  obviously  switches  to  the  one  with  the  highest  score.  Also, 
in  unconstrained  environments,  our  target  may  leave  the  FoV,  or  be  completely  occluded  by  other  objects. 
The  tracker  will  simply  switch  to  another  region  satisfying  the  threshold  6.  Here,  our  tracker  automatically 
exploits  all  the  distracters  and  pays  attention  to  them  by  tracking  them  simultaneously.  Also,  our  tracker 
discovers  a  set  of  supporters  to  robustly  identify  the  target  among  other  similar  regions 

8.4,  Experiments 

In  our  implementation,  we  use  8  ferns  and  4  6bitBP  features  per  fern.  All  thresholds  are  fixed  as  described. 
The  most  important  threshold  is  the  one  which  validates  the  correct  target.  It  is  calculated  by  the  NCC 
between  a  candidate  and  the  online  object  model.  It  has  been  carefully  chosen  as  80%  according  to  the 
experiment  demonstrated  in  [74],  which  also  shown  that  LOOP  event  outperforms  the  other  growing  ones. 
Our  scanning  window  starts  searching  the  minimum  region  of  20x20.  For  a  sequence  of  resolution  320x240, 
the  number  of  search  windows  is  100k  while  in  640x480,  it  is  600k,  approximately. 

We  compare  our  tracker  with  and  without  context  elements.  The  PNTracker  is  used  as  reference.  It 
is  worth  noting  that  the  implementation  of  PNTracker1  is  the  combination  implementation  of  [74,  75].  To 
emphasize  the  contribution  of  context  in  terms  of  distracters  and  supporters,  we  choose  two  very  challenging 
sequences  which  contains  similar  objects  moving:  Multiplefaces,  and  Babies.  It  is  important  to  note  that  in 
most  of  the  cases  where  no  strong  context  exists,  our  tracker  still  shows  overall  better  results  than  PNTracker 
and  outperforms  other  state-of-the-art  methods. 

To  demonstrate  the  performance  of  our  context  tracker  we  use  several  recent  state-of-the-art  methods 
for  comparison  including:  FragTracker  (FT)  [2],  MILTracker  (MILT)  [11],  CotrainingTracker  (CoTT)  [193], 
PNTracker  [75],  DNBSTracker  (DNBST)  [87],  VTDTracker  [81].  All  codes  com  from  the  original  authors. 

’PNTracker:  http://info.ee.surrey.ac.Uk/Personal/Z.Kalal/ 


69 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


Video 

Frames 

FT 

MILT 

CoTT 

DNBS 

VTD 

PNT 

Ours 

Animal 

72 

69 

9 

8 

19 

6 

37 

9 

Carchase 

5000 

lost  @  355 

lost  @  355 

lost  @  409 

lost  @  364 

lost  @  357 

@1645 

24 

Clutter 

1528 

lost  @1081 

lost  @  413 

9 

6 

6 

4 

6 

ETHPed 

874 

lost  @  95 

lost  @  95 

lost  @  95 

lost  @  635 

lost  @  95 

10 

16 

Girl 

502 

lost  @  248 

30 

14 

39 

69 

19 

18 

Liquor 

1407 

lost  @  47 

lost  @  288 

30 

lost  @  404 

lost  @  404 

21 

10 

Motocross 

2665 

lost  @137 

lost  @  485 

lost  @591 

lost  @10 

lost  @10 

10 

12 

Multifaces 

1006 

lost  @  64 

lost  @  64 

lost  @  394 

lost  @  64 

lost  @  64 

@  97 

26 

Scale 

1911 

8 

11 

6 

lost  @  269 

3 

6 

2 

Vehicle 

946 

lost  @  679 

lost  @481 

9 

lost  @517 

lost  @  517 

8 

8 

Speed 

1.6 

14 

2 

7 

0.2 

12* 

10 

Table  3.8:  Average  center  location  error  (pixels).  Performance  comparison  between  the  trackers  (FT:  Frag- 
Tracker  [2],  MILT:  MILTracker  [11],  CoTT:  Co-Tracker  [193],  DNBS:  DNBSTracker  [87],  VTD:  Visual 
Tracking  Decomposition  [81],  PNT:  PNTracker  [75],  and  Ours:  Our  context  tracker)  in  different  challeng¬ 
ing  video  sequences.  The  best  performance  is  in  bold,  the  second  best  is  in  italic.  The  number  in  blue  color 
indicates  the  frame  number  when  the  tracker  gets  lost.  The  *  indicates  that  the  method  was  implemented  on 
Matlab  using  C-Mex. 


The  chosen  data  set  includes  several  challenging  sequences:  Motocross  and  Carchase  in  [75],  Vehicle 
in  [193],  Liquor3,  ETHPedestrian 4,  Multifaces2 ,  Clutter  and  Scale5,  Animal  used  in  [81],  and  Girl  in  [11]. 
They  contain  occlusion  and  object  leaving  FoV  ( Motocross ,  Carchase,  Vehicle,  ETHPedestrian,  Multifaces, 
Girl),  very  cluttered  background  ( Carchase ,  Liquor,  ETHPedestrian,  Multifaces)  out-of-plane  rotation  ( Car- 
chase ,  Vehicle,  Multifaces,  Girl),  abrupt  motion  ( Motocross ,  Clutter,  Scale,  Animal),  motion  blur  ( Liquor, 
Clutter,  Animal).  Several  of  them  are  recorded  in  unconstrained  environments  such  as  Motocross,  Vehicle, 
ETHPedestrian,  Carchase,  and  Animal. 

Because  our  chosen  data  set  is  very  challenging  with  a  number  of  long-term  sequences,  most  current 
methods  fail  somewhere  in  the  middle  of  a  sequence.  Therefore,  we  note  the  frame  number  where  a  tracker 
starts  to  lose  the  object  and  never  reacquires.  It  means  we  accept  the  result  of  a  tracker  even  when  it  fails 
to  get  the  right  target  in  several  frames  before  reacquisition  happens.  A  target  is  considered  “lost”  if  the 
overlapping  region  between  its  bounding  box  and  the  ground-truth  is  less  than  50%. 

The  quantitative  comparisons  are  shown  in  Table  3.8.  The  running  time  comparison  (in  the  last  row) 
is  for  a  raw  reference  as  different  methods  have  different  search  range  which  impacts  the  speed  greatly.  It 
shows  that  our  tracker  has  overall  better  performance  than  PNTracker  of  [75]  with  the  help  of  context,  and 
outperforms  all  other  approaches.  Although  most  of  them  may  work  well  in  controlled  environments,  it  is 
difficult  for  them  to  consistently  follow  the  target  in  long-term  sequences  and  in  unconstrained  environments. 
There  are  some  large  numbers  in  our  results  (“Carchase”,  “Motocross”)  is  because  it  reacquires  the  object  in 
several  frames  later  than  the  ground  truth,  which  makes  the  overall  score  look  not  good  when  we  calculate 
the  error  using  its  previous  position. 

9.  Active  Vision  System  Using  a  Network  Pan- Tilt- Zoom  Camera 

Research  on  human  faces  is  a  very  important  and  interesting  topic  in  computer  vision  with  various  appli¬ 
cations  such  as  biometric  identity.  Regular  CCTV  cameras  cannot  extract  human  faces  with  reasonable 
resolution  when  they  are  far  away.  PTZ  cameras  can  zoom,  pan,  and  tilt  to  get  a  close  view.  Today,  this 
process  only  occurs  under  human  control,  which  is  impractical  with  a  large  number  of  cameras.  Thus, 

3PROST  dataset:  http :  / /gpu4vision  .  icg .  tugraz  .  at  / index .  php?content  =  subsites/prost /prost . 
php 

4ETH  dataset:  http  :  /  /www  .vision.ee.ethz  .  ch  /  ~aes  s /dataset  / 

5  http : // www . vision .ee.ethz.ch/boostingTrackers/ index. htm 
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(c)  Motocross  sequence  (d)  Carchase  sequence 


Figure  3.23:  Some  snapshots  of  our  context  tracker  on  several  sequences. 


Figure  3.24:  Scenario 


automatically  generating  high  resolution  face  sequences  from  PTZ  cameras  would  be  very  helpful.  Many 
challenges  such  as  network  delays,  packet  loss,  and  slow  response  commands  still  need  to  be  addressed. 
Given  such  a  scenario,  we  need  not  only  a  robust  real-time  tracker  but  also  a  flexible  and  smooth  control 
module.  The  tracker  needs  to  learn  the  face  appearance  changes  under  different  conditions.  It  also  has  to 
be  robust  against  cluttered  background,  abrupt  motion,  motion  blur,  and  must  reacquire  the  object  after  total 
occlusion  or  leaving  FOV.  In  addition,  when  a  PTZ  camera  is  in  wide  mode,  it  performs  as  a  regular  CCTV 
camera,  which  means  when  a  person  is  far  from  the  camera,  no  face  detector  can  detect  the  face  within  only 
few  pixels  (as  shown  in  Figure  3.24).  A  practical  system  should  automatically  identify  the  region  of  interest 
containing  the  face,  and  zoom  to  detect,  then  track  this  face.  To  address  all  of  the  above  issues,  we  present 
an  autonomous  system  running  in  3  different  image  modes,  and  2  camera  control  modes.  The  overview  is 
shown  in  Figure  3.25,  where  the  operators  acting  on  image  and  on  camera  are  illustrated  in  light  blue  and 
light  orange  boxes,  respectively.  Our  system  uses  a  PTZ  network  camera  to  capture  image  sequences.  The 
detector  in  pedestrian  detection  mode  detects  people  in  the  FOV.  The  camera  then  switches  to  ROI  focusing 
mode  in  order  to  zoom  in  the  upper  part  of  the  detected  human  body.  After  that,  the  system  goes  into  face 
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Figure  3.25:  Overview  of  our  system 
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1  step  1  step  1  step 


Figure  3.26:  One-step-back  strategy  camera  control 


detection  mode  to  find  the  face  of  interest,  which  was  hard  to  detect  from  far  away  in  wide  angle  focal  length 
(Figure  3.24).  An  active  tracking  mode  consiting  a  control  loop  with  two  interchangeable  modules:  camera 
control  and  tracking  ones  is  then  triggered  to  follow  the  target  by  keeping  it  around  the  center  of  the  image 
at  a  proper  predefined  size. 

9.1.  Pedestrian  Detection  Module 

Many  methods  model  the  background  then  apply  a  background  subtraction  algorithm  to  detect  moving 
objects.  Flowever,  it  is  impractical  to  model  the  background  under  all  of  possible  variations  of  PTZ  camera 
parameters.  Moreover,  the  environment  conditions  can  change  all  of  the  time.  In  practice,  there  are  not  only 
pedestrians  moving  but  also  other  objects  such  as  vehicles  and  foliage  in  wind.  Flence,  we  want  to  generalize 
the  problem  by  employing  a  frame-based  state-of-the-art  pedestrian  detector  to  find  a  walking  person  in  the 
camera’s  FOV.  Several  pedestrian  detection  methods  were  proposed  such  as  [42,  187].  Flowever,  most 
of  them  do  not  have  near  real-time  performance.  Recently,  Huang  et  al.\l()\  proposed  a  high  performance 
object  detection  using  joint  ranking  of  granules  (JRoG)  features.  With  a  simple  ground  plane  estimation,  this 
detector  takes  only  70  ms  to  scan  a  640x480  test  image  at  16  different  scales.  JRoG  is  a  simplified  version  of 
APCF  [50]  descriptor  after  excluding  gradient  granules.  For  feature  selection,  two  complementary  modules 
which  are  an  SA  and  an  incremental  module  were  successfully  proposed  to  find  the  optimal  solution.  After 
that,  the  part-based  detection  approach  [187]  is  adopted  to  find  the  optimal  combination  of  multiple  partitions 
of  body  parts  in  the  complex  scene.  For  more  details  please  refer  to  [70].  To  avoid  false  alarms,  we 
use  the  simple  frame  difference  background  subtraction  technique  to  filter  out  most  of  static  region  in  the 
background.  Also,  it  is  important  to  note  that  we  do  not  need  to  run  the  pedestrian  detector  in  every  frame, 
and  we  continuously  run  it  once  per  second  instead. 

9.2.  ROI  focusing  module 

After  receiving  the  results  from  the  pedestrian  detector  module,  the  system  automatically  chooses  the  highest 
confidence  response  and  switches  to  the  ROI  focusing  mode.  Roughly,  the  head  of  a  person  is  about  1/5 
the  total  height  of  whole  body.  In  this  mode,  the  camera  parameters  are  adjusted  so  that  the  head  position  is 
close  to  the  center  of  the  image  while  its  height  is  in  the  defined  range.  More  clearly,  let  C(cx,  cy)  denote 
the  center  of  the  image,  P(px,py )  the  center  and  hp  the  height  of  the  ROI  in  the  current  state.  The  camera 
pans,  tilts,  and  zooms  so  that  on  the  image  plane: 


if 

if 


P^C 
hp  mirih 
hp  mciXh 
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hp  <  mirth 
hp  >  maxh 


(3.25) 
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Figure  3.27:  Indoor  experiment  results 


where  mirih  =  80  and  maxh  =  120  are  the  minimum  and  maximum  heights  of  the  face,  respectively. 
Because  this  function  only  needs  to  be  performed  once,  absolute  commands  are  sent  to  control  the  camera. 
The  output  of  this  mode  is  a  new  video  stream  where  the  face  in  high  resolution  is  likely  in  the  image.  It  is 
important  to  note  there  is  some  possibility  that  no  face  exists  in  the  new  video  s  tic  am:  a  false  alarm  from 
pedestrian  detector,  a  person  facing  back  or  a  fast  moving  pedestrian. 

9.3.  Face  Detection  Module 

This  mode  is  in  charge  of  detecting  the  face  in  the  current  image.  We  employ  the  real-time  face  detection 
proposed  by  Froba  and  Ernst  [56].  This  method  introduces  a  new  feature  set,  illumination  invariant  Local 
Structure  Features  for  object  detection.  The  new  feature  set  are  computed  from  a  3x3  pixel  neighborhood 
using  a  modified  version  of  Census  Transform  [196]  for  efficient  computation.  A  cascade  of  classifiers  of 
four  stages  is  adopted,  each  of  which  stands  a  linear  classifier  consisting  a  set  of  lookup  tables  of  feature 
weights.  The  detection  is  then  carried  out  by  scanning  all  of  possible  windows  with  a  fixed  size  of  22x22 
pixels  for  each  scaling  factor  of  1.25  until  the  size  of  the  image  is  less  than  22x22.  Among  the  set  of 
detected  responses,  the  best  one  is  chosen  as  the  target.  This  face  detector  runs  at  20fps  on  640x480  image 
sequences  and  can  detect  faces  with  less  than  45°  out-of-plane  rotation.  For  more  information  please  refer 
to  the  original  work  in  [56]. 

Once  the  face  is  detected,  the  active  tracking  mode  is  triggered  with  two  modules:  camera  control  and 
tracking. 

9.4.  Camera  Control  Module 


(a)  Outdoor  environment  with  enough  lighting 


(b)  Outdoor  environment  with  low-light  and  noisy,  out-of-focus  images 


Figure  3.28:  Outdoor  experiment  results 

This  mode  automatically  controls  the  camera  to  follow  the  object.  The  control  needs  to  perform  smoothly 
and  precisely.  We  use  relative  control  commands  with  a  truncated  negative  feedback  to  avoid  oscillation. 
When  sending  commands  to  the  camera,  we  adopt  a  time  sharing  strategy  in  which  one  pan-tilt  command 
and  one  zoom  command  are  packed  as  one  sending  group,  and  two  sending  groups  with  one  inquiry  com¬ 
mand  make  a  control  loop.  This  is  because  the  camera  queue  is  limited  and  we  should  not  send  commands 
frequently;  otherwise,  when  the  queue  is  full,  later  commands  are  dropped.  Moreover,  the  delay  in  respond- 
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ing  to  the  inquiry  command  is  longer  than  the  others,  so  a  proper  time  sharing  strategy  is  helpful.  Note  that 
most  PTZ  cameras  carry  the  commands  by  mechanical  processes  which  accumulate  errors  by  time.  This  fact 
prevents  us  from  acquiring  the  current  status  camera  based  on  the  initial  status  by  computing  the  difference 
at  every  step. 

In  practice,  it  is  har'd  to  follow  the  face  all  of  the  time,  especially  when  focal  length  is  long,  i.e.  when  the 
camera  zooms  in  on  the  far  away  face.  The  FOV  of  the  camera  is  narrow,  and  the  face  can  move  out  of  the 
image  in  2-3  frames.  To  address  this  issue,  we  introduce  a  one-step-back  strategy  (as  shown  in  Figure  3.26). 
We  divide  the  zoom  range  in  nine  steps  covering  from  IX  to  18X.  We  ignore  the  further  zoom  because  the 
FOV  is  too  narrow  at  that  time.  Whenever  the  face  is  no  longer  tracked  after  k  =  20  consecutive  frames,  the 
system  zooms  out  one  step  back  with  the  hope  to  reacquire  the  face  again.  The  system  iteratively  continues 
this  process  until  it  can  re-acquire  the  face  (using  the  tracker,  not  the  detector),  otherwise  the  camera  gets 
back  to  the  widest  mode.  The  system  comes  back  to  the  home  position  after  missing  the  target  after  10 
seconds  of  no  detection.  The  home  position  is  preset. 

9.5.  Tracking  Module 

We  use  our  simplified  context  tracker  described  in  previous  section.  We  reduce  the  number  of  ferns,  maxi¬ 
mum  distracters  and  supporters  allowed. 

9.6.  Experiments 

We  use  a  Sony  PTZ  network  camera  SNC-RZ30N.  This  camera  covers  a  large  range  of  pan  angles  (—170°  — > 
+170°  ),  tilt  angles  (—90°  — »•  +25°),  and  a  large  zoom  ratio  (25  A'  optical).  The  camera  image  stream  sent 
to  the  computer  is  set  at  640x480  with  medium  quality.  Our  complete  system  runs  at  15fps.  The  complete 
system  has  been  tested  in  challenging  indoor  and  outdoor  environments  with  real-life  situations.  Our  system 
successfully  detects  a  walking  person,  zooms  in  on  his  face  and  keeps  track  for  a  while.  For  indoor  (see 
Figure  3.27),  it  is  very  challenging  due  to  the  face  leaving  FOV,  large  viewpoint  changes,  lighting  condition 
changes.  For  outdoor  environment,  we  tackle  two  different  situations:  one  is  in  the  afternoon  with  enough 
lighting,  one  is  in  the  evening  with  low  light.  The  results  show  that  even  when  the  camera  cannot  focus  on 
the  face  with  auto-focusing  mode,  and  there  is  a  lot  of  noise,  our  system  still  delivers  acceptable  results. 
Some  snapshots  are  shown  in  Figure  3.28.  It  is  important  to  note  that,  to  reduce  the  overhead  of  recording 
images,  we  only  save  the  sequences  of  images  after  the  face  is  detected  and  starts  to  be  tracked.  For  detailed 
results,  please  refer  to  our  supplemental  video. 

10.  Logic  Models  for  Image  and  Video  Tracking 

This  research  was  presented  at  IASTED  Conf.  on  Signal  and  Image  Processing.  The  work  was  carried 
out  by  UCLA  masters  student  James  H.  von  Brecht  UCLA  postdoc  Sheshadri  R.  Thiruvenkadam  under 
the  direction  of  Professor  Tony  Chan.  In  this  work,  we  present  a  variational  method  for  tracking  objects 
under  occlusion.  We  utilize  prior  shape  information  and  the  Logic  Models  of  Sandberg  and  Chan  to  locate 
and  segment  the  object  of  interest  in  each  frame  of  the  video  sequence.  In  particular,  we  demonstrate  how 
incorporating  the  shape  prior  via  the  Logic  Models  allows  us  to  avoid  segmentation  local  minima  which 
occur  with  algorithms  that  simply  additively  introduce  shape,  and  how  to  use  the  Logic  Models  within  the 
context  of  tracking. 

We  model  the  object  in  the  current  frame  as  an  affine  transformation  g  of  a  shape  prior,  which  we 
represent  via  a  level  set  function  ip.  We  minimize  two  separate  segmentation  energies  to  find  both  the 
appropriate  affine  registration  parameters  p  and  the  correct  segmentation,  represented  as  the  zero  level-set 
of  a  function  <j>.  Each  energy  corresponds  to  a  different  logical  interpretation  of  the  object  in  the  frame  to  be 
analyzed.  Both  of  the  energies  take  the  familial'  form 

E  =  [  fmn  (<P)  +  /  font  (1-H  ((/>)). 

J  J 
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The  functions  ftn,out  are  image  terms  which  drive  the  segmentation.  The  shape  prior  is  incorporated  as 
a  binary  image  into  each  term.  Since  the  shape  and  image  terms  are  coupled,  we  avoid  local  minima 
problems  encountered  when  shape  is  introduced  additively.  The  image  terms  vary  depending  upon  the 
correct  logical  interpretation  (AND/OR)  of  the  image  to  be  analyzed,  hence  we  minimize  each  of  the  two 
energies  separately,  then  select  one  of  the  two  as  the  correct  segmentation.  Lastly,  we  introduce  a  means 
of  automatically  determining  the  correct  selection,  thus  allowing  us  to  utilize  the  Logic  Models  within  a 
tracking  algorithm.  An  example  of  the  algorithm  is  shown  in  Figure  3.29. 
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Figure  3.29:  Occlusion  tracking  from  two  video  sequences 


Work  by  postdoc  Berta  Sandberg  and  PhD  student  Matthew  Keegan,  under  the  direction  of  Tony  Chan, 
ha  the  goal  of  increasing  the  speed  of  detection  of  an  object  of  interest  while  decreasing  false  detection 
rate.  Two  sets  of  algorithms  were  developed  to  aid  this  objective.  The  first  algorithm  has  been  developed  to 
extract  the  moving  object  from  background.  A  logic  framework  model  was  developed  for  multiphase  seg¬ 
mentation  on  images  that  allows  the  combination  of  both  intensity  of  the  region  where  movement  occurs, 
and  the  velocity  vector  field  to  identify  the  best  possible  view  of  the  moving  object.  The  second  algo¬ 
rithm  implemented  was  a  clustering  algorithm  that  identifies  and  sorts  the  shape  of  objects  using  calculated 
shape  features.  By  extracting  the  precise  shape  of  the  object  and  identifying  its  shape  features  we  hope  to 
differentiate  between  moving  objects  of  interest  and  noise  from  trees  blowing  in  the  wind  or  rain/snow. 
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Chapter  4 

Sensor  Management  for  Tracking 


This  chapter  is  intended  to  summarize  the  contributions  of  the  group  of  Dr.  Veeravalli  in  the  theory  of 
sensor  management  for  tracking  made  with  the  support  of  this  grant.  Our  work  focused  on  two  variants  of 
the  sensor  management  problem,  namely,  sensor  scheduling  and  sensor  sleeping.  The  former  refers  to  the 
scenario  where  the  sensors  can  be  turned  on  or  off  at  consecutive  time  steps  and  the  goal  is  to  select  the 
subset  of  sensors  to  activate  at  each  time  step.  The  latter  scenario  refers  to  the  scenario  where  an  asleep 
sensor  cannot  be  communicated  with  or  woken  up,  and  hence  the  sleep  duration  needs  to  be  determined  at 
the  time  the  sensor  goes  to  sleep  based  on  all  the  information  available  to  the  sensor. 

1.  Sensor  Sleeping 

The  sensor  nodes  typically  need  to  operate  on  limited  energy  budgets.  In  order  to  conserve  energy,  the 
sensors  may  be  put  into  a  sleep  mode.  However,  it  is  clear  that  having  sleeping  sensors  in  the  network  that 
cannot  be  woken  up  could  result  in  tracking  errors,  and  hence  there  is  a  tradeoff  between  the  energy  savings 
and  the  tracking  error  that  results  from  the  sleeping  at  the  sensors.  The  sleeping  policies  at  the  sensors 
should  be  designed  to  optimize  this  tradeoff. 

A  straightforward  approach  is  to  have  each  sensor  enter  and  exit  the  sleep  mode  using  a  fixed  or  a 
random  duty  cycle.  A  more  intelligent,  albeit  more  complicated,  approach  is  to  use  information  about  the 
object  trajectory  that  is  available  to  the  sensor  from  the  network  to  determine  the  sleeping  strategy.  In 
particular,  it  is  easy  to  see  that  the  location  of  the  object  (if  known)  at  the  time  when  the  sensor  is  put  to 
sleep  would  be  useful  in  determining  the  sleep  duration  of  the  sensor;  the  closer  the  object,  the  shorter  the 
sleep  duration  should  be.  We  took  this  latter  approach  in  this  work  in  designing  sleeping  strategies  for  the 
sensors.  All  information  about  the  object  trajectory  is  stored  at  some  central  unit  and  is  used  to  determine 
sleep  times  of  sensors  that  come  awake.  Using  a  bottom-up  approach,  we  consider  different  sensing,  motion 
and  cost  models  with  increasing  levels  of  difficulty. 

1.1.  Simplified  Models 

We  study  the  problem  of  tracking  an  object  that  is  moving  through  a  net¬ 
work  of  wireless  sensors  as  shown  in  Figure  4.1.  Each  sensor  has  a  lim¬ 
ited  range  for  detecting  the  presence  of  the  object  being  tracked,  and  the 
objective  is  to  track  the  location  of  the  object  to  within  the  accuracy  of  the 
range  of  the  sensor.  First  we  considered  a  simplified  model  for  a  sensor 
network  with  n  sensors  [58].  We  assume  that  the  sensing  ranges  of  the 
sensors  completely  cover  the  region  of  interest  with  no  overlap.  In  other 
words,  the  region  can  be  divided  into  n  cells  with  each  cell  correspond¬ 
ing  to  the  sensing  range  of  a  particular  sensor.  Each  sensor  can  be  in  one 

of  two  states:  awake  or  asleep.  A  sensor  in  the  awake  state  consumes  ...  ,  ,  . 

'  Figure  4.1:  Object  tracking  in  a 

more  energy  than  one  in  the  asleep  state.  However,  object  sensing  can  be  fidd  of  sensors  (simplified  model) 
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performed  only  in  the  awake  state. 

The  movement  of  the  object  to  be  tracked  is  described  by  a  Markov  chain  whose  state  is  the  current 
location  of  the  object  to  within  the  accuracy  of  a  cell.  However,  we  also  append  a  special  terminal  state, 
denoted  as  T,  that  occurs  when  the  object  leaves  the  network.  The  statistics  for  the  object  movement  arc 
described  by  a  (n  +  1)  x  (n  +  1)  probability  transition  matrix  P  such  that  PL  j  is  the  probability  of  the  object 
being  in  state  j  at  the  next  time  step  given  that  it  is  currently  in  state  i.  Let  bk  denote  the  location  of  the 
object  at  time  k.  We  can  describe  the  evolution  of  the  object  location  stochastically  as  bk+\  ~  ry,(.  P,  where 
e,  is  a  row  vector  with  a  1  at  the  i-th  entry  and  zero  elsewhere. 

A  central  controller  keeps  track  of  the  state  of  the  network  and  assigns  sleep  times  to  sensors  that  are 
awake.  In  particular,  each  sensor  that  wakes  up  remains  awake  for  one  time  unit  during  which  the  following 
actions  arc  taken:  (i)  if  the  object  is  within  its  range,  the  sensor  detects  the  object  and  sends  this  information 
to  the  central  unit,  and  (ii)  the  sensor  receives  a  new  sleep  time  (which  may  equal  zero)  from  the  central 
controller.  The  sleep  time  input  is  used  to  initialize  a  timer  at  the  sensor  that  is  decremented  by  one  time  unit 
each  time  step.  When  this  timer  expires,  the  sensor  wakes  up.  Let  rkj  denote  the  value  of  the  sleep  timer 
of  sensor  £  at  time  k.  We  call  the  n-vector  r/ -  the  residual  sleep  times  of  the  sensors  at  time  k.  Also,  let 
ukj  denote  the  sleep  time  input  supplied  to  sensor  £  at  time  k.  We  can  describe  the  evolution  of  the  residual 
sleep  times  as 

rk+ 1,£  =  (' rk,e  “  >  0  +  =  0  C4-1) 

The  first  term  on  the  right  hand  side  of  this  equation  expresses  that  if  the  sensor  is  currently  asleep  (the  sleep 
timer  for  the  sensor  is  not  zero),  the  sleep  timer  is  decremented  by  1.  The  second  term  expresses  that  if  the 
sensor  is  currently  awake  (the  sleep  timer  is  zero),  the  sleep  timer  is  reset  to  the  current  sleep  time  input  for 
that  sensor. 

Hence,  our  system  is  described  through  a  discrete-time  dynamical  model,  with  control  input  uk  and 
exogenous  input  wk.  The  state  of  the  system  at  time  k  is  described  by  xy.  =  (bk,  rk)  and  the  state  evolution. 
Not  all  of  xk  is  known  to  the  central  unit  at  time  k  since  /y;.  is  known  only  if  the  object  is  currently  being 
tracked.  Thus  we  have  a  dynamical  system  with  incomplete  (or  partially  observed)  state  information.  If  we 
denote  the  observation  available  to  the  central  unit  at  time  k  by  zk,  then  zk  =  (.sy;. ,  r/,.),  with 


'  bk  if  bk^T  and  rk,bk  =  0 
<  £  if  bk  /  T  and  rk,bk  >  0 

T  if  bk  =  T 


(4.2) 


where  £  denotes  an  unknown  or  “erasure”  value.  The  total  information  available  to  the  control  unit  at  time 
k  is  given  by 

4  =  4o,  uk_  i)  (4.3) 

with  Iq  =  zo  denoting  the  initial  (known)  state  of  the  system.  The  control  input  for  sensor  £  at  time  k  is 
allowed  to  be  a  function  of  Ik,  i.e., 

Uk  =  Hk{h)  (4.4) 

The  vector-valued  function  /j,k  is  the  sleeping  policy  at  time  k. 

We  identify  the  two  costs  present  in  our  tracking  problem.  The  first  is  an  energy  cost  of  c  G  (0, 1]  for 
each  sensor  that  is  awake.  The  second  is  a  tracking  cost  of  1  for  each  time  unit  that  the  object  is  not  tracked. 
If  the  object  leaves  the  network  ( bk  enters  the  terminal  state),  we  assume  the  problem  terminates  and  no 
further  cost  is  incurred.  Thus,  the  total  cost  at  time  k  is  given  by 

g(xk)  =  \bk  /  T  ^1  rkfik  >  0  +  ^  clrk, j  =  (4.5) 
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where  c  is  the  parameter  used  to  tradeoff  energy  consumption  and  tracking  errors.  The  total  cost  (over  a 
possibly  infinite  horizon  trajectory)  for  the  system  is  given  by 


E 


^2g(xk) 


lk=\ 


h 


(4.6) 


Since  g  is  bounded  by  {an  +  1)  and  the  expected  time  till  the  object  leaves  the  network  is  finite,  the  cost 
function  J  well-defined.  The  goal  is  to  compute  the  solution  to 


J*(I o)  =  min  J0(I0,  no,  m, . . .)  (4.7) 

The  solution  to  this  optimization  problem  for  each  value  of  c  yields  an  optimal  sleeping  policy.  The  opti¬ 
mization  problem  falls  under  the  framework  of  partially  observable  Markov  decision  process  (POMDP). 

Partial  observability  presents  a  problem  since  the  information  for  decision-making  at  time  k  given  in 
(4.3)  is  unbounded  in  memory.  To  remedy  this,  we  seek  a  sufficient  statistic  for  optimization  that  is  bounded 
in  memory.  It  is  a  standard  argument  that  for  such  an  observation  model,  a  sufficient  statistic  is  given  by 
the  probability  distribution  of  the  state  xk  given  Ik.  Such  a  sufficient  statistic  is  referred  to  as  a  belief  state. 
Since  the  residual  sleep  times  portion  of  our  state  is  observable,  the  sufficient  statistic  can  be  written  as 
vk  =  (p*;,  T]f),  where  pk  is  a  row  vector  of  length  n  +  1  that  denotes  the  probability  distribution  of  bk  given 
I}..  Mathematically,  we  have 

Pk/  =  P(kfc  =  t\h)  (4-8) 

We  can  write  the  evolution  of  pk  as 

Pk+ 1  =  erl&fe+i  =  T  +  ebk+1  lrk+hbk+1  =  0  +  [PkP}{j:rk+1J> o}  lrk+i,bk+1  >  0  (4.9) 

where  rk+ \  is  defined  through  (4.1)  and  bk+ 1  (conditioned  on  pk)  is  distributed  as 


h+i  ~  Pkp 


(4.10) 


To  understand  (4.9),  note  that  if  the  object  is  observed  at  time  k  + 1,  pk+  \  becomes  a  point-mass  distribution 
with  all  the  probability  mass  concentrated  at  bk+\.  If  the  object  is  not  observed,  we  eliminate  all  probability 
mass  at  sensors  that  arc  awake  (since  the  object  is  known  to  not  be  at  these  locations)  and  renormalize.  Thus, 
all  information  from  observations  is  incorporated. 

We  can  then  write  our  policy  and  cost  function  in  terms  of  the  sufficient  statistic,  and  the  optimal  cost 
defined  in  (4.7)  becomes 

J*(v o)  =  min  J0{v0,  po,  Pi,  ■  ■  ■)  (4.11) 

MO , Mlv 

There  exists  a  stationary  optimal  policy  for  our  problem  (i.e.  po  =  pi  =  ■  ■  ■  =  /<*).  Such  a  policy  and  the 
optimal  cost  J*  can  be  found  by  solving  the  Bellman  equation  given  as 

J{v)  =  min  E  [<?(xi)  +  J(vi)\v0  =  v,u0  =  p{v)]  (4.12) 

t1 

with  p*  being  the  minimizing  value  of  p  in  this  equation.  The  state  space  for  this  problem  consists  of  pk, 
which  is  uncountably  infinite,  and  rk,  which  is  countably  infinite.  Thus,  we  must  either  have  an  analytical 
solution  for  the  cost  function  at  each  iteration  (which  is  not  possible  given  the  complexity  of  the  problem)  or 
we  must  quantize  and  truncate  the  state  space  so  that  there  are  a  finite  number  of  states.  Of  course,  restricting 
the  infinite  state  space  to  a  finite  state  space  will  lead  to  some  loss  of  optimality.  Even  with  the  restriction  to  a 
finite  state  space,  the  complexity  of  value  iteration  remains  intractable  except  for  the  most  trivial  cases.  This 
is  because  the  state  space  grows  exponentially  with  the  number  of  sensors.  For  example,  even  with  seven 
sensors,  a  maximum  sleep  time  of  10,  and  a  probability  mass  function  quantized  to  multiples  of  0.1,  there 
arc  about  109  possible  states  vk .  Hence,  finding  an  optimal  solution  to  this  problem  is  not  feasible.  During 
the  course  of  this  MURI  program  we  identified  other  approaches  which  were  shown  to  be  near-optimal. 
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1.1.1.  Approximate  Sleeping  Policies 

Much  of  the  complexity  of  this  problem  stems  from  the  complicated  evolution  of  pk  given  in  (4.9).  In 
deriving  suboptimal  solutions  to  our  problem,  we  make  assumptions  about  the  observations  that  will  be 
available  in  the  future.  These  assumptions  allow  us  to  simplify  the  evolution  in  (4.9)  considerably.  The 
result  is  that  the  optimization  problem  easily  separates  into  n  simpler  problems,  one  for  each  sensor.  In 
each  of  these  simpler  problems,  we  arc  able  to  eliminate  the  residual  sleep  times  rk  from  the  state  since  the 
only  times  of  interest  will  be  those  when  the  sensor  comes  awake.  It  is  then  possible  to  solve  each  of  the  n 
simpler  problems  to  find  a  cost  function  and  policy. 

We  now  define  some  additional  notation.  Under  each  assumption,  we  refer  to  an  optimal  cost  and  policy 
as  J*  and  //* .  These  functions  may  written  in  terms  of  the  state  (J*(v),  ft* (v))  or  the  component  parts  of 
the  state  ( J*  ( p,r ),  /i*  (p.  r)).  The  optimal  cost  and  policy  for  the  simpler  problem  for  sensor  t  will  be  J**7-' 
and  respectively.  These  functions  arc  written  in  terms  of  the  p  portion  of  the  state  (./*(  r)  (p),  //*,7,i  (p)). 
Define  J*W(p,  r )  for  r  >  0  as 


r—  1 


3= 1 


i=  1 


(4.13) 


We  showed  for  each  assumption  that  with  these  definitions,  J*  defined  through 

n 

J*  (v)  =  J*(p,r)  =  J2  (p,  n)  (4.14) 

1= t 

is  a  solution  to  (4.12).  It  is  also  shown  that  if  p*^\p)  is  set  to  the  minimizing  value  of  u  for  all  p  and  for 
all  i  6  {1, ... ,  n},  then  p*  defined  through 

p*(v)  =  p*(p,r)  =  [p*<'1\p),p<2\p),...,p*(-n\p)}  (4.15) 

is  the  minimizing  policy  in  (4. 12).  In  other  words,  the  sleep  times  for  each  sensor  can  be  chosen  indepen¬ 
dently  as  a  function  of  p  alone. 

1.1.2.  FCR  Solution 

To  generate  the  first  cost  reduction  (FCR)  solution,  we  assume  that  we  will  have  no  future  observations.  In 
other  words,  we  arc  replacing  (4.9)  with 

Pk+i=Php  (4  -16) 

Note  that  this  does  not  mean  that  it  will  be  impossible  to  track  the  object;  we  are  simply  making  an  assump¬ 
tion  about  the  future  state  evolution  in  order  to  generate  a  sleeping  policy.  We  can  solve  the  equation 


jW(p)  =  min  (  [PpJ %  +  E  c  [PpU+1]i  +  J(e\ppU+1) 

U=1 


i— 1 


to  find  the  cost  function  and  policy.  It  is  easy  to  verify  that 


J*(f)  (p)  =  ^  min  |  [pPJ]  r  E  c  [PpJ]  i 


i= 1 


(4.17) 


(4.18) 


is  indeed  a  solution  to  (4.17).  In  other  words,  at  each  time  step  we  incur  a  cost  that  is  the  minimum  of  the 
expected  tracking  cost  at  sensor  £  and  the  expected  energy  cost  at  sensor  l.  Define  the  set  U{p)  as 

U{p)  =  L  :  [pPu+1]i  >  E  c  [PpU+1}l\  (4-I9) 
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It  is  then  easily  verified  that  the  per-sensor  policy  /j*(T  (p)  has  the  form 

p*^(p)  =  min  u  (4.20) 

u£U  (p) 

More  simply,  the  policy  is  to  come  awake  at  the  first  time  such  that  the  expected  tracking  cost  exceeds  the 
expected  energy  cost.  This  is  why  this  solution  is  called  the  first  cost  reduction  solution. 

1.1.3.  Qmdp  Solution 

A  Qmdp  solution  is  one  in  which  it  is  assumed  that  the  partially  observed  state  becomes  fully  known  after 
a  control  input  has  been  chosen.  In  our  problem,  this  means  assuming  that  we  will  have  perfect  future 
observations,  i.e.,  the  location  of  the  object  will  be  known  in  the  future.  In  other  words,  we  are  replacing 
(4.9)  with 

Pk+ 1  =  ebk+ 1  (4-21) 

Note  that  this  does  not  mean  that  it  will  be  impossible  to  incur  tracking  errors;  we  arc  simply  making  an 
assumption  about  the  future  state  evolution  in  order  to  generate  a  sleeping  policy.  We  can  solve  the  equation 

(u  n  n  \ 

X]  [PpJ]l  +  ^2C  i PpU+1 ]  i  +  [ PpU+1 ]  i  j(t)  ( ei )  (4-22) 

3= l  *=l  i= l  ) 

to  find  the  cost  function  and  policy. 

Clearly,  if  we  can  solve  (4.22)  for  p  =  e/,  for  all  b  6  {1, . . .  ,n},  then  it  is  straightforward  to  find 

the  solution  for  all  other  values  of  p.  We  therefore  concern  ourselves  with  finding  values  of  and 

that  satisfy  (4.22)  for  all  b  G  {1 .... ,  n } .  This  can  be  achieved  through  policy  iteration.  Note  that 
for  the  Qmdp  solution,  we  arc  assuming  more  information  than  is  actually  available.  Thus,  the  cost  function 
obtained  under  the  QMDP  is  a  lower  bound  on  optimal  performance. 

1.1.4.  Point  Mass  Approximations 

The  suboptimal  policies  derived  in  the  preceding  sections  are  considerably  easier  to  compute  than  the  opti¬ 
mal  policy  and  can  be  computed  on-line  after  some  initial  off-line  computation  has  been  completed.  How¬ 
ever,  such  on-line  computation  requires  sufficient  processing  power  and  could  introduce  delays.  It  would  be 
convenient  if  the  suboptimal  p*  could  be  precomputed  and  stored  either  at  the  central  controller  or  distributed 
across  the  sensors  themselves.  The  latter  option  is  particularly  attractive  since  it  allows  for  distributed  im¬ 
plementation.  But  the  set  of  possible  distributions  p  is  potentially  quite  large  —  even  if  quantization  is 
performed  —  and  could  make  the  storage  requirements  prohibitive.  To  make  the  storage  requirements  feasi¬ 
ble,  we  considered  approximating  p  with  a  point  mass  distribution.  The  number  of  sleep  times  to  be  stored  is 
then  only  n  per  sensor.  We  considered  two  options  for  the  placement  of  the  unit  point  mass  when  computing 
the  sleep  time  for  sensor  £:  (i)  the  centroid  of  p,  and  (ii)  the  nearest  point  to  sensor  i  on  the  support  of  p. 
Note  that  the  latter  option  allows  for  the  implementation  of  policies  without  detailed  information  about  the 
statistics  of  the  random  walk  -  only  the  support  of  the  random  walk  is  required. 

1.1.5.  Numerical  Results 

In  this  section,  we  show  some  results  that  illustrate  the  performance  of  the  policies  we  derived  in  previous 
sections.  We  begin  with  simulation  results  for  one-dimensional  sensor  networks.  In  each  simulation  run, 
the  object  was  initially  placed  at  the  center  of  the  network  and  the  location  of  the  object  was  made  known  to 
each  sensor.  A  simulation  run  concluded  when  the  object  left  the  network.  The  results  of  many  simulation 
runs  were  then  averaged  to  compute  an  average  tracking  cost  and  an  average  energy  cost. 

We  present  results  for  a  one-dimensional  network  with  61  sensors  where  the  object  can  move  anywhere 
from  three  positions  to  the  left  to  three  positions  to  the  right  at  each  time  step,  with  the  object  movement 
being  uniformly  distributed  on  these  seven  positions. 

In  Figure  4.2,  we  show  the  tradeoff  curves  between  energy  cost  and  tracking  cost.  From  these  data. 
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(a)  (b) 


Figure  4.2:  Energy-Tracking  tradeoff  curves. 

we  see  that  the  QMDP  policy  outperforms  the  FCR  policy,  although  the  difference  does  not  appear  as  large. 
Note  that  the  difference  between  the  performance  of  the  QMDP  policy  and  the  lower  bound  on  optimal 
performance  becomes  small  as  the  number  of  tracking  errors  becomes  small.  This  makes  sense  since  when 
there  are  few  tracking  errors,  the  QMDP  assumption  (that  we  will  know  the  object  location  in  the  future) 
becomes  realistic.  In  Figure  4.2(b),  we  explore  the  impact  of  using  the  point  mass  approximations  on  the 
performance  of  the  QMDP  policy.  Four  curves  are  shown  in  each  figure.  The  first  two  are  the  lower  bound  and 
Qmdp  tradeoff  curves  already  seen.  The  third  and  fourth  curves  are  the  tradeoff  curves  for  the  QMDP  policy 
using  the  centroid  and  nearest  point  point  mass  approximations  respectively.  It  can  be  seen  that  there  is 
indeed  some  loss  in  performance  when  using  point  mass  approximations,  but  this  loss  becomes  small  as  the 
number  of  tracking  errors  becomes  small.  This  makes  sense  since  when  tracking  errors  are  infrequent,  the 
object  location  is  usually  known  exactly  and  so  the  distribution  is  usually  already  a  point  mass  distribution. 

For  the  moment,  consider  the  traditional  duty  cycle  scheme  where  each  sensor  is  awake  in  a  fraction  7 r 
of  the  time  slots.  As  7r  is  varied,  we  achieve  a  tradeoff  curve  that  is  a  straight  line  between  the  points  (0, 1) 
and  (n,  0)  (where  n  is  the  appropriate  number  of  sensors).  When  compared  with  this  policy,  the  schemes 
we  have  proposed  result  in  significant  improvement  as  seen  in  Figure  4.3(a).  In  Figure  4.3(b),  we  consider 
a  two-dimensional  network.  The  network  considered  is  a  11  x  11  grid  (121  nodes).  The  movement  of  the 
object  is  best  described  by  stating  that  at  each  time  step  the  object  starts  at  the  center  of  a  3  x  3  grid  and 
moves  to  any  of  the  nine  spaces  on  that  grid  with  equal  probability.  It  is  seen  that  the  results  of  Figure  4.3 
are  similar  to  those  already  seen  in  Figure  4.2. 

1.2.  Generalized  Models 

In  this  section  we  summarize  extensions  to  the  work  presented  in  Section  1.1.  We  extend  our  analysis  to 
more  generalized  models  for  object  movement,  object  sensing,  and  tracking  cost  [60].  We  allow  the  number 
of  possible  object  locations  to  be  different  from  the  number  of  sensors.  The  number  of  possible  object 
locations  can  even  be  infinite  to  model  the  movement  of  an  object  on  a  continuum.  Moreover,  the  object 
sensing  model  allows  for  an  arbitrary  distribution  for  the  observations  given  the  current  object  location, 
and  the  tracking  cost  is  modeled  via  an  arbitrary  distance  measure  between  the  actual  and  estimated  object 
location. 

Not  surprisingly,  this  generalization  results  in  a  problem  that  is  much  more  difficult  to  analyze.  Our 
approach  is  to  build  on  the  policies  developed  for  the  simplified  model  of  Section  1.1.  The  design  of  those 
policies  relied  on  the  separation  of  the  problem  into  a  set  of  simpler  subproblems.  In  [58],  we  have  shown 
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(a)  Comparison  to  Duty  Cycle 


(b)  Tradeoff  curves  for  a  two-dimensional  network. 


Figure  4.3:  Comparison  to  conventional  duty  cycle  approach  and  tradeoffs  for  a  2-D  network. 


that  under  an  observable-after-control  assumption,  the  design  problem  lends  itself  to  a  natural  decomposi¬ 
tion  into  simpler  per-sensor  subproblems  due  to  the  simplified  nature  of  the  tracking  cost  structure.  Unfor¬ 
tunately,  this  does  not  extend  to  the  generalized  cases  we  consider  herein.  Flowever,  based  on  the  intuition 
gained  from  the  structure  of  the  solution  in  the  simplified  case,  we  artificially  separate  our  problem  into  a  set 
of  simpler  per-sensor  subproblems.  The  parameters  of  these  subproblems  are  not  known  a  priori  due  to  the 
difficulties  in  analysis.  However,  we  use  Monte  Carlo  simulation  and  learning  algorithms  to  compute  these 
parameters.  We  characterize  the  performance  of  the  resulting  sleeping  policies  through  simulation.  For  the 
special  case  of  a  discrete  state  space  with  continuous  Gaussian  observations,  we  derive  a  lower  bound  on 
the  optimal  energy-tracking  tradeoff  which  is  shown  to  be  loose  at  the  high  tracking  error  regime,  but  is 
reasonably  tight  for  the  low  tracking  error  region. 

We  extend  the  definitions  presented  in  Section  1.1  to  account  for  the  new  models.  We  denote  the  set  of 
possible  object  locations  as  B  such  that  \B\  =  m  +  1  where  the  ( m  +  l)-th  state  represents  the  absorbing 
terminal  state  that  occurs  when  the  object  leaves  the  network.  If  B  is  not  a  finite  set  then  m  is  oo.  We  define 
a  kernel  P  such  that  P(x,  y )  is  the  probability  that  the  next  object  location  is  in  the  set  y  C  B  given  that 
the  current  object  location  is  x.  Suppose  p  is  a  probability  measure  on  B  such  that  p{X)  for  X  e  B  is  the 
probability  that  the  state  is  in  X  at  the  current  time  step.  Then  the  probability  that  the  state  will  be  in  y  after 
t  time  steps  in  the  future  is  given  by 


(pP*)(ZF)=  [  p(dx)Pt{x,y)  (4.23) 

Jb 

This  defines  the  measure  pPl  which  depends  on  both  the  prior  p  and  the  transition  Kernel  P.  Also,  let  Sx 
denote  a  probability  measure  such  that  5X (A)  =  1  if  x  €  A,  and  8X(A)  =  0  otherwise.  Conditioned  on 
the  object  state  bk,  the  future  state  bk+i  has  a  distribution  5bkP ■  This  defines  the  evolution  of  the  object 
location.  For  a  discrete  state  space  this  was  simply  the  probability  mass  function  defined  by  the  bk- th  row 
of  a  transition  matrix  P. 

To  define  the  tracking  cost,  we  first  define  the  estimated  object  location  at  time  k  to  be  bk .  We  can  think 
of  bk  as  an  additional  control  input  that  is  a  function  of  //.,,  i.e.,  bk  =  /3 k{Ik)-  Since  bk  does  not  affect  the 
state  evolution,  we  do  not  need  past  values  of  this  control  input  in  Ik-  The  tracking  cost  is  a  distance  measure 
that  is  a  function  of  the  actual  and  estimated  object  locations  and  is  written  as  d(bk,  bk).  We  assume  that  d 
is  a  bounded  function  on  B  x  B.  Two  examples  of  distance  measures  we  might  employ  are  the  Hamming 
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cost  (if  the  space  B  is  finite),  i.e., 


d(bk,bk)  —  -j-  bk 


(4.24) 


and  the  squared  Euclidean  distance  (if  the  space  B  is  a  subset  of  an  appropriate  vector  space),  i.e., 


d(bk,bk)  =  || bk  -  bk ||| 


(4.25) 


Recall  that  the  input  bk  does  not  affect  the  state  evolution;  it  only  affects  the  cost.  Therefore,  we  can 
compute  the  optimal  choice  of  bk,  given  by  /3^(Ik),  using  an  optimization  minimizing  the  tracking  error  over 
a  single  time  step.  We  can  thus  write 


/3fc(4)  =  argminE 
b 


d(bklbk) 


(4.26) 


Remembering  that  once  the  terminal  state  is  reached  no  further  cost  is  incurred,  we  can  write  the  total 
cost  for  time  step  k  as 


g(bk,  h)  =  Uk^T  (bk,  /31(h))  +  ^2  Clrk,e  =  0 

The  infinite  horizon  cost  for  the  system  is  given  by 


t= l 


J(Io,B o,Mi>  •••)  =  £ 


^2g(bk,h 

lk=i 


(4.27) 


(4.28) 


The  solution  to  an  optimization  problem  equivalent  to  the  one  in  4.12  for  each  value  of  c  yields  an  optimal 
sleeping  policy.  The  task  of  recursively  computing  pk  for  each  k  is  a  problem  in  nonlinear  filtering.  In  other 
words,  pk- |-i  can  be  computed  using  standard  Bayesian  techniques  as  the  posterior  measure  resulting  from 
prior  measure  pP  and  observations  sk+ 1 .  We  can  then  write  our  dynamic  programming  problem  in  terms 
of  the  sufficient  statistic  and  solve  for  the  optimal  policy  gk(Pk- rk)-  Since  an  optimal  solution  could  not 
be  found  for  the  simpler  problem  considered  in  Section  1.1,  we  immediately  turn  our  attention  to  finding 
suboptimal  solutions  to  the  generalized  version  of  the  problem. 

In  general,  the  cooperation  among  the  sensors  may  be  difficult  to  analyze  and  understand.  Furthermore, 
unlike  the  simple  model,  the  tracking  cost  may  not  be  easily  written  as  a  sum  across  the  sensors.  To  generate 
suboptimal  solutions  we  artificially  write  the  problem  as  a  set  of  subproblems  that  can  be  solved  using  similar 
techniques.  The  tracking  cost  expressions  (which  arc  a  function  of  the  sleeping  actions  of  the  sensors)  in 
these  subproblems  will  be  left  as  unknowns.  To  determine  appropriate  values  for  these  tracking  costs,  we 
either  perform  Monte  Carlo  simulations  before  tracking  begins  or  use  data  gathered  during  tracking.  The 
intuition  is  that  if  the  resultant  tracking  cost  expressions  capture  the  “typical”  behavior  of  the  actual  tracking 
cost,  then  our  sleeping  policies  should  perform  well. 

Our  approach  has  two  main  ingredients.  First,  we  make  assumptions  about  the  observations  that  will  be 
available  to  the  controller  at  future  time  steps.  To  generate  sleeping  policies,  we  assume  that  the  system  is 
either  perfectly  observable  or  totally  unobservable  after  control.  Hence,  we  define  approximate  recursions 
with  special  structure  as  surrogates  for  the  optimal  value  function.  Second,  we  devise  different  methodolo¬ 
gies  to  evaluate  suitable  tracking  costs  whereby  we  capture  the  effect  of  each  sensor  on  the  overall  tracking 
cost.  Writing  the  combined  tracking  cost  as  the  sum  of  independent  contributions  of  different  sensors  (with 
respect  to  some  baseline)  allows  us  to  write  the  Bellman  equation  as  the  sum  of  per-sensor  recursions.  In¬ 
stead  of  solving  the  exact  Bellman  equation,  we  alternatively  solve  n  simpler  Bellman  equations  to  find 
per-sensor  policies  and  cost  functions.  The  overall  policy  is  then  the  per-sensor  policies  applied  in  parallel. 

We  denote  by  J-1'1  the  cost  function  of  the  t-tli  sensor  approximate  subproblem.  We  define  TA(b,£)  to 
be  the  increase  in  tracking  cost  due  to  not  waking  up  sensor  l  at  time  k  given  that  bk- 1  =  b.  This  is  meant 
to  capture  the  contribution  of  the  /:-th  sensor  to  the  total  tracking  cost.  Next  we  define  our  approximations. 
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1.2.1.  Qmdp 

Based  on  the  structure  presented  in  Section  1.1,  we  can  readily  define  a  QMDP  per- sensor  Bellman  equation 
analogous  to  the  one  in  (4.22) 

=  min  (£/8_TrA(M)  (pPj)(db)  +  ^  (c  +  JW(<56))  (pPu+1)(db) j  (4.29) 

The  first  summation  in  the  Right  Hand  Side  (R.H.S.)  of  (4.29)  corresponds  to  the  expected  tracking  cost 
incurred  by  the  sleep  duration  u  of  sensor  l.  The  second  term  consists  of:  (i)  the  energy  cost  incurred  as 
the  sensor  comes  awake  after  its  sleep  timer  expires  (after  u  +  I  time  slots);  and  (ii)  the  cost  to  go  under  an 
observable-after-control  assumption  (hence  the  belief  state  is  6},). 

We  cannot  find  an  analytical  solution  for  (4.29).  However,  note  that  if  we  can  solve  (4.29)  for  p  =  5b  for 
all  b,  then  it  is  straightforward  to  find  the  solution  for  all  values  of  p.  Thus,  given  a  function  TA,  (4.29)  can 
be  solved  through  standard  policy  iteration,  but  only  if  B  is  finite. 

1.2.2.  First  Cost  Reduction  (FCR) 

Similarly,  we  define  a  First  Cost  Reduction  (FCR)  Bellman  equation  analogous  to  the  one  in  (4.30)  as 


jW(p)  =  min  (  V  /  Ta(6,  £)  ( pPj)(db )  Pel  (pPu+1)(db)  +  JW(pPu+1)  |  (4.30) 

u  \  Jb-t  Jb-t 


In  this  case,  it  is  assumed  that  we  will  have  no  future  observations.  In  other  words,  we  define  the  belief 
evolution  as  pk+i  =  PkP ■  Given  a  function  TA,  it  is  easy  to  verify  that  the  solution  to  (4.30)  is 


j{,)(p)  =  £  min 
j= o 


j  [  TA(b,£)  (Ppi)(db),c  [  (PP>+1)(db)\ 

Ub-t  Jb-t  J 


(4.31) 


and  the  associated  policy  is  to  choose  the  first  value  of  u  such  that 

c  [  0 pPu+1)(db )  >  [  TA(b,  £)  (PPu)(db), 

Jb-t  Jb-t 


(4.32) 


In  other  words,  the  policy  is  to  come  awake  at  the  first  time  the  expected  tracking  cost  exceeds  the  expected 
energy  cost  where  the  tracking  cost  is  defined  based  on  TA  (to  be  determined). 

The  solutions  to  the  per-sensor  Bellman  equations  in  (4.29)  and  (4.30)  define  the  QMDP  and  FCR  policies 
for  each  sensor,  respectively.  Note  that,  unlike  the  simplified  case,  the  solution  to  the  QMDP  recursion  does 
not  necessarily  provide  a  lower  bound  on  the  optimal  value  function  since  the  employed  tracking  cost  is  not 
a  lower  bound  on  the  actual  tracking  cost.  Later,  we  present  a  lower  bound  on  the  optimal  energy-tracking 
tradeoff  for  discrete  state  spaces  with  Gaussian  Observations.  The  remaining  task  was  to  identify  appropriate 
values  of  TA(b ,  £)  for  all  b  /  T  and  for  all  £.  This  is  the  subject  of  the  next  two  sections. 

1.2.3.  Non-learning  Approach 

For  now,  suppose  that  B  is  a  finite  space.  Suppose  bk-i  =  b.  To  generate  TA(b,£)  for  a  particular  £,  we 
first  assume  a  “baseline”  behavior  for  the  sensors,  i.e.,  we  make  an  assumption  about  the  set  of  sensors  that 
are  awake  at  time  k  given  that  i  =  b.  We  consider  two  possibilities:  (i)  That  all  sensors  are  asleep;  (ii) 
That  the  set  of  sensors  awake  is  selected  through  a  greedy  algorithm.  In  other  words,  the  sensor  that  causes 
the  largest  decrease  in  expected  tracking  cost  is  added  to  the  awake  set  until  any  further  reduction  due  to  a 
single  sensor  is  less  than  c.  Starting  with  this  set  of  awake  sensors,  the  value  of  TA(b.  £)  is  then  computed 
as  the  absolute  difference  in  expected  tracking  cost  incurred  by  changing  the  state  of  sensor  £.  Monte  Carlo 
simulation  can  be  used  to  evaluate  the  change  in  expected  tracking  cost. 
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If  B  is  not  finite,  then  a  parameterized  version  of  TA  can  be  computed  instead.  We  choose  fh  elements 
of  B  —  T  and  evaluate  TA  at  these  points.  The  value  of  TA  at  all  other  values  of  b  €  B  —  T  can  be  computed 
via  an  interpolation  algorithm.  Recall  that  only  an  FCR  policy  is  appropriate  in  the  infinite  state  case,  since 
solving  the  QMDP  Bellman  equation  for  an  infinite  number  of  point  mass  distributions  is  infeasible. 

1.2.4.  Learning  Approach 

Next,  we  describe  an  alternative  learning-based  approach.  For  ease  of  exposition,  suppose  that  B  is  a  finite 
space.  Then  our  probability  measure  pk  can  be  characterized  by  a  probability  mass  function.  We  refer  to 
this  probability  mass  function  as  pk  (a  row  vector).  Define  ciy:j  to  be  the  approximated  expected  increase  in 
tracking  cost  due  to  sensor  t  sleeping  at  time  k  as 

ay=  ^pfc-i(6)TA(M)  (4.33) 

b^T 

Ideally,  we  would  like  this  approximation  to  be  equal  to  the  actual  expected  increase  in  tracking  cost  due 
to  sensor  t  sleeping.  Unfortunately,  we  do  not  have  access  to  actual  tracking  costs  at  time  k  since  by-  is  not 
known  exactly.  However,  we  do  have  access  to  pk,  rk,  and  pk_ It  is  therefore  possible  to  estimate  the 
tracking  cost  as 

[  d(b,/3*{pk))pk(db)  (4.34) 

Jb 

For  example,  if  Hamming  cost  is  being  used,  then  we  can  estimate  the  tracking  cost  as  1  —  max/,  7;/.  ({(;}), 
and  if  squared  Euclidean  distance  is  being  used  we  can  estimate  the  tracking  cost  using  the  variance  of  the 
measure  pk.  Next  we  describe  how  we  learn  TA  by  solving  a  least  squares  problem. 

Determining  an  estimate  of  the  increase  in  the  tracking  cost  due  to  the  sleeping  of  sensor  £  at  time  k, 
denoted  akj.  depends  on  the  value  of  rkj.  If  rkj:  =  0,  we  ignore  the  observation  from  sensor  £  and  generate 
a  new  version  of  pk  called  p'k.  We  can  compute  akg  as 

fly  =  22  P'k(b)d(bi  /3*(p'k))  -  '22  Pk{b)d(b,  (5* (pk))  (4.35) 

MT  b^T 

If  on  the  other  hand  rk/  >  0,  we  first  generate  an  object  location  b'k  according  to  pk  and  then  generate  an 
observation  according  to  the  probability  measure  ab'k .  This  observation  is  used  to  generate  a  new  distribution 
p'k  from  pk.  Then  we  compute  akj  as 

=  22pk^d(^b^*(Pk))  -  (4.36) 

MT  b^T 

We  now  have  an  approximation  sequence  b,k  j  and  an  observation  sequence  akj.  At  time  k  —  1,  our  goal 
is  to  choose  TA  to  minimize 


E  [{dk,e  -  ak:i)2]  (4.37) 

We  apply  the  Robbins-Monro  algorithm  [135],  a  form  of  stochastic  gradient  descent,  to  this  problem  in  order 
to  recursively  compute  a  sequence  of  TA  that  will  hopefully  solve  this  minimization  problem  for  large  k. 
The  update  equation  is 

TA(M)  =  TA  j(M)  -  2 aktb  +  Tpk_i(b)(ak/  -  oM)  (4.38) 

where  ak  is  a  step  size.  Note  that  16  /  Tpk_y(b)  is  the  gradient  of  akj  with  respect  to  TA(b,  £). 

Using  a  constant  step  size  in  our  simulations,  we  could  only  observe  small  oscillations  in  the  values  of 
Ta.  If  B  is  not  finite,  then  we  can  again  parameterize  TA.  The  Robbins-Monro  algorithm  can  be  applied  in 
this  context  as  well,  although  the  gradient  expressions  will  depend  on  the  type  of  interpolation  used. 
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1.2.5.  A  Lower  Bound 

Deriving  a  lower  bound  is  generally  difficult  for  the  considered  problem.  However,  we  derived  a  lower 
bound  for  the  special  case  of  a  discrete  state  space  with  Gaussian  observations.  The  idea  is  to  combine  the 
observable-after-control  assumption  with  a  separable  lower  bound  on  the  tracking  cost  as  we  demonstrate  in 
what  follows.  When  awake,  the  sensors  observations  arc  Gaussian,  i.e., 

+  <439) 

where  up  is  the  location  of  sensor  £  and  V  some  positive  constant. 

First,  the  following  Lemma  provides  a  lower  bound  on  the  expected  tracking  cost. 


Lemma  1.1.  Given  the  current  belief  pk,  an  action  vector  Up,  the  current  residual  sleep  times  vector  rp,  the 
Gaussian  obsetyation  model  in  (4.39),  the  Hamming  cost  definition  in  (4.24),  and  a  mean  received  signal 
strength  rrij  when  the  target  is  at  state  j,  the  expected  tracking  cost  is  lower  bounded  by 


m  m  fd  ffi  —  \ 

E[d(bp+i,bk+1)\pk,uk}  =  Y  Pk  (*)  Y  P(bk+i  =  3 1 ' bk  =  *)  ™  Q  l  Y  +  (4-40) 

*= i  j= i  ^  \  ki  J 


where  m  is  the  size  of  the  discrete  state  space,  i.e.,  the  number  of  possible  object  locations,  =  - 'fp — — 

A rripj  =  trip  —  rrij,  and  Q{.)  is  the  normal  distribution  Q-function. 

Since  the  mean  received  signal  strength  depends  on  whether  the  sensors  arc  awake  or  asleep,  the  distance 
dpj  is  a  function  of  the  next  step  residual  sleep  vector  rk+ To  highlight  this  dependence,  we  will  sometimes 
use  the  notation  dpj(r)  when  needed. 

The  next  step  is  to  use  this  result  to  compute  a  separable  bound  on  the  tracking  error,  which  combined 
with  an  observable-after-control  assumption  would  lead  to  a  decomposable  lower  bound  on  the  optimal  value 
function.  The  idea  is  to  separate  out  the  contribution  of  every  sensor  by  assuming  that  every  other  sensor 
is  awake  and  study  the  tracking  error  when  that  sensor  is  awake  or  asleep.  Our  next  Lemma  establishes  a 
separable  lower  bound  on  the  expected  tracking  cost. 


Lemma  1.2.  The  expected  tracking  cost  is  lower  bounded  by 

n  C  m  m 

E[d(bk+i,bk+i)\pk,uk,rk]  >  Yxe(Pk)\  lrk+i/  =  0^pfc(*)To(p,L^)  +  %rk+i,e  >  0 

e=  1  l  i= 1  i=  1 

(4.41) 

where 

m  /  ,  /j-.-.  \Pp\j 

To(p ;  i,t)  =  £p(6fc+i  =  j\h  =  i)  maxQ  I  - F 

k ^  \  2  dpj(  U) 

m  ( rt  (V\  \  ^-p^j 

T{p\  i,  t)  =  Yp(bk+ 1  =3 \h  =  i)  max Q  (  —  ^ 

where  0  is  the  all  zero  vector  and  0_£  denotes  a  vector  of  length  n  with  all  entries  equal  to  zero  except  for 
the  £-th  entry  which  can  be  anything  greater  than  0. 

Intuitively,  ?o(p;  i,  £)  represents  the  contribution  of  sensor  £  to  the  total  expected  tracking  cost  when  the 
underlying  state  is  i,  the  belief  is  p  and  when  all  sensors  arc  awake.  On  the  other  hand  T(p:  i ,  £)  is  the  f-th 
sensor  contribution  when  it  is  asleep  and  all  the  other  sensors  are  awake. 

We  can  readily  state  a  lower  bound  on  the  optimal  value  function  in  the  following  Theorem. 
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Theorem  1.1.  A  lower  bound  on  the  optimal  value  function  at  belief  state  e can  be  obtained  as  a  solution 
to  the  following  optimization  problem 

n  (  u—  1  m  m 

J{ef)  =  max  Emn  EEt  ebPj]i\(i,  £)T(i,  t)  +  Y,  [ebPu]iKh  i)T0(i,  £) 

A(c)  1=1  u\  l  j= o  i= 1  i= 1 

mm  'j 

+  cY^[ebPu+1]i  +  Y,lebpU+1}*Je(e^  \  (4-42) 

i= 1  i= 1  ) 

subject  to  Aln  =  lm 

where  lm  is  a  column  vector  of  all  ones  of  length  m.  The  matrix  A  is  defined  for  every  value  of  c,  where 
A (c)  is  an  m  X  n  matrix  with  the  (i,  £)  entry  equal  to  A (i,  £),  i.e.,  A(c)  =  {A(f,  £)}.  The  quantities  T(i,  £) 
and  X(i.  i)  are  shorthand  for  T(ef,i,  £)  and  A/(e,),  respectively.  The  inner  minimization  is  over  the  control 
action  iff  for  sensor  £  given  a  belief  state  Of,. 

A  closed  form  solution  for  (4.42)  cannot  be  obtained,  and  hence,  we  solve  for  J(ef)  numerically.  First, 
we  fix  A  and  use  policy  iteration  to  solve  for  the  control  of  each  sensor  at  each  state.  Then,  we  change  A 
and  repeat  the  process.  The  envelope  of  the  generated  value  functions  (corresponding  to  different  instants 
of  A)  is  hence  a  lower  bound  on  the  optimal  value  function. 

1.2.6.  Numerical  Results 

In  this  section,  we  show  some  simulation  results  illustrating  the  performance  of  the  described  policies.  For 
the  non-learning  policies,  the  value  of  TA(b,£)  for  each  b  and  £  was  generated  using  200  Monte  Carlo 
simulations.  The  results  of  50  simulation  runs  were  averaged  when  plotting  the  curves.  For  the  learning 
policies,  the  values  for  TA  were  initialized  to  those  obtained  from  the  non-learning  approach  using  greedy 
sensor  selection  as  a  baseline.  A  constant  step  size  of  0.01  was  used  in  the  learning  algorithm. 

We  first  consider  a  simple  network  that  we  term  Network  A.  This  is  a  one-dimensional  network  with  4 1 
possible  object  locations  where  the  object  moves  with  equal  probability  either  one  to  the  left  or  one  to  the 
right  in  each  time  step.  There  is  a  sensor  at  each  of  the  41  object  locations  that  makes  (when  awake)  a  binary 
observation  that  determines  without  error  whether  the  object  is  at  that  location.  Hamming  cost  is  used  for 
the  tracking  cost. 

For  Network  A,  we  illustrate  the  performance  of  the  QMDP  versions  of  our  policies  in  Figure  4.4(a)  and 
the  FCR  versions  of  our  policies  in  Figure  4.4(b). 


(a) 


(b) 


Figure  4.4:  Tradeoff  curves  for  Network  A:  (a)  QMdp  policies;  (b)  FCR  policies 
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(a)  (b) 

Figure  4.5:  Tradeoff  curves  for  Network  B  and  a  lower  bound:  (a)  QMDP  policies;  (b)  FCR  policies 


The  curves  labeled  “Asleep”  are  for  the  nonlearning  approach  for  computing  TA  where  we  assume 
that  all  sensors  arc  asleep  as  a  baseline.  The  curves  labeled  “Greedy”  are  for  the  nonlearning  approach  for 
computing  TA  where  we  use  a  greedy  algorithm  to  determine  our  baseline.  The  curves  labeled  “Learning” 
employ  our  learning  algorithm  for  computing  TA.  From  the  tradeoff  curves,  it  is  apparent  that  using  the 
learning  algorithm  to  compute  TA  results  in  improved  performance.  A  close  inspection  of  Figures  4.4(a) 
and  4.4(b)  will  reveal  that  the  QMDP  policies  perform  somewhat  better  than  their  FCR  counterparts.  This  is 
consistent  with  what  was  observed  earlier. 


Table  4.1:  Object  movement  for  Network  B . 


Change  in  Position 

0 

1 

2 

3 

Probability 

0.3125 

0.2344 

0.0938 

0.0156 

We  then  consider  a  new  one-dimensional  network  termed  Network  B.  The  possible  object  locations 
are  located  on  the  integers  from  1  to  21.  The  object  moves  according  to  a  random  walk  anywhere  from 
three  steps  to  the  left  to  three  steps  to  the  right  in  each  time  step.  The  distribution  of  these  movements  is 
given  in  table  4.1.  The  change  in  position  indicate  movement  by  a  corresponding  number  of  steps  to  the 
right  or  to  the  left.  There  are  10  sensors  in  this  network  so  that  rn  ^  n.  The  locations  of  the  sensors  are 
given  in  Table  4.2  and  awake  sensors  make  Gaussian  observations  as  in  (4.39).  Results  for  the  QMDP  and 


Table  4.2:  Sensor  locations  for  Network  B. 


Sensor 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Location 

1.36 

1.61 

3.91 

8.09 

11.96 

13.39 

13.52 

13.66 

16.60 

18.68 

FCR  versions  of  our  policies  are  shown  in  Figures  4.5(a)  and  4.5(b),  respectively.  The  results  confirm  the 
same  general  trends  observed  for  Network  A.  The  figures  also  show  our  derived  lower  bound  on  the  energy¬ 
tracking  tradeoff.  Not  surprisingly,  the  lower  bound  is  particularly  loose  at  the  high  tracking  cost  regime, 
yet  the  gap  is  reasonably  small  for  the  low  tracking  error  region.  This  is  expected  since  the  lower  bound  uses 
an  all-awake  assumption  to  lower  bound  the  contribution  of  each  sensor  to  the  tracking  error.  Flowever,  it  is 
worth  mentioning  that  we  can  exactly  compute  the  saturation  point  for  the  optimal  scheduling  policy,  which 
matches  the  saturation  limit  of  the  shown  curves,  since  every  policy  has  to  eventually  meet  the  all-asleep 
performance  curve  when  the  energy  cost  per  sensor  is  high.  At  that  point,  all  sensors  are  put  to  sleep  and 
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hence  the  target  estimate  can  only  be  based  on  prior  information.  The  small  gap  at  the  low  tracking  error 
regime  combined  with  the  aforementioned  saturation  effect  highlight  the  good  performance  of  our  sleeping 
policies. 

Our  results  are  not  restricted  to  1-D  networks  but  easily  extend  to  2-D  networks.  Namely,  Figure 
4.6(right)  shows  the  energy-tracking  tradeoff  of  the  QMDP  and  FCR  policies  for  the  2-D  network  of  Figure 
4.6(left)  with  continuous  observations  and  Hamming  cost.  To  demonstrate  that  our  techniques  can  be  applied 


Figure  4.6:  (Left)  2-D  network  with  17  sensors  (stars)  and  25  possible  object  locations  (squares).  (Right) 
Energy-Tracking  tradeoff  of  the  QMDP  and  FCR  sleeping  policies  for  a  2D  network  with  continuous  obser¬ 
vations  and  Hamming  cost. 


to  an  object  that  moves  on  a  continuum,  we  define  a  new  network.  Network  C.  This  network  is  identical  to 


for 


two 


changes. 


Network  B  except 

First,  the  object  can  take  locations  anywhere  on  the 
interval  [1,21].  Second,  the  object  moves  accord¬ 
ing  to  Brownian  motion  with  the  change  in  posi¬ 
tion  between  time  steps  having  a  Gaussian  distribu¬ 
tion  with  mean  zero  and  variance  1.  As  mentioned 
earlier,  only  FCR  policies  can  be  generated  for  this 
type  of  network.  Values  of  TA  were  computed  for 
each  integer- valued  object  location  on  [1,21]  and 
linear  interpolation  used  to  compute  values  of  TA 
for  other  object  locations.  Since  continuous  dis¬ 
tributions  cannot  be  easily  stored,  particle  filtering 
techniques  were  employed.  The  number  of  parti¬ 
cles  used  was  512  and  resampling  was  performed  at 
each  time  step.  Tradeoff  curves  for  Network  C  are 
shown  in  Figure  4.7.  Although  the  tradeoff  curves 
are  less  smooth  than  before,  this  figure  illustrates 
performance  trends  similar  to  those  already  seen. 

The  reason  the  curves  are  not  as  smooth  is  that  occasionally  the  particle  filter  would  fail  to  keep  track 
of  the  distribution  with  sufficient  accuracy.  This  would  cause  the  network  to  lose  track  of  the  object  and 
cause  abnormally  bad  tracking  for  that  simulation  run. 


Figure  4.7:  Tradeoff  curves  for  FCR  policies  for  Net¬ 
work  C. 
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1.3.  Sensor  Sleeping  for  Multi-target  Tracking 

We  extend  our  analysis  to  the  tracking  of  multiple  objects  [59].  A  discussion  of  the  tracking  of  multiple 
objects,  often  termed  multitarget  tracking  (MTT),  can  be  found  in  [92],  Tracking  multiple  objects  is  not 
a  simple  extension  of  tracking  a  single  object  due  to  the  data  association  problem.  This  problem  arises 
whenever  the  identity  of  the  objects  cannot  be  determined  from  the  observations.  Thus,  even  if  all  locations 
where  objects  arc  located  arc  known  exactly,  it  may  not  be  known  which  location  corresponds  to  which 
object.  This  uncertainty  leads  to  an  explosion  in  the  set  of  possibilities  that  must  be  considered  and  makes 
optimal  solution  difficult.  Suboptimal  tracking  algorithms  arc  then  needed.  We  therefore  formulate  a  design 
problem  wherein  we  keep  track  of  the  full  joint  distribution  for  the  object  locations.  This  approach  is  optimal 
but  can  quickly  become  intractable  for  large  numbers  of  objects.  Our  simulation  results  will  focus  on  the 
two-object  case  to  make  computation  as  simple  as  possible.  However,  most  of  our  analysis  applies  to  the 
general  (/-object  case  and  we  indicate  how  our  solutions  scale  with  increasing  number  of  objects.  The  results 
of  our  work  arc  a  set  of  suboptimal  sleeping  policies.  These  policies  are  compared  with  lower  bounds  on 
optimal  performance  that  we  derived  in  the  course  of  our  analysis.  Our  simulation  results  show  how  our 
suboptimal  policies  compare  with  optimal  performance.  Our  policies  also  perform  significantly  better  than 
naive  approaches  that  do  not  use  information  about  the  locations  of  the  objects.  Furthermore,  one  of  our 
policies  uses  only  information  about  the  marginal  distributions  of  the  objects  and  thus  scales  well  with 
increasing  numbers  of  objects  and  can  be  used  in  concert  with  suboptimal  tracking  algorithms. 

We  stick  to  the  simplified  model  of  Section  1.1  with  non-overlapping  sensing  regions.  An  awake  sensor 
can  only  detect  whether  one  or  more  objects  is  within  its  range  and  can  detect  neither  the  exact  number  of 
objects  present  nor  which  objects  arc  present. 

We  arc  interested  in  tracking  q  objects  that  move  independently  according  to  their  individual  first-order 
Markov  models.  We  will  write  the  combined  state  of  the  q  objects  as  a  vector  of  length  q.  There  arc  (n  +  \)q 
possible  states  for  this  vector.  The  state  T  =  [n  +  1, . . . ,  n  +  1]  is  the  terminal  state  that  occurs  when  all 
objects  have  left  the  network.  Once  this  state  is  reached,  no  further  cost  is  incurred. 

If  we  denote  the  observation  available  to  the  central  unit  at  time  k  by  zk,  then  we  have  zk  =  (sk,  rk), 
where  s k  is  a  (n  +  1) -vector  of  observations  with 


Sk,i 


0  if  ('A-y;  =  0  and  no  objects  are  at  location  ^ 

<  1  if  r-f.  g  =  0  and  one  or  more  objects  are  at  location  t 
£  if  rkj  >  0 


(4.43) 


where  £  is  an  erasure  that  provides  no  information.  We  can  further  decompose  tracking  error  into  two 
components.  The  first  component  is  observation  error  that  occurs  when  we  fail  to  observe  a  particular  object. 
The  second  component  is  data  association  error  that  occurs  when  the  objects  have  been  misidentified.  To 
perform  the  object  identification,  we  define  the  vector  of  estimated  object  locations  at  time  k  to  be  bk.  We 
can  think  of  bk  as  an  additional  control  input  that  is  a  function  of  I/,.,  i.e. 

bk  =  Pk(h)  (4.44) 


We  combine  observation  and  data  association  errors  by  defining  a  tracking  error  to  have  occurred  when 
either  an  observation  error  or  a  data  association  error  has  occurred.  A  cost  of  1  is  incurred  for  each  tracking 
error.  Thus  the  tracking  cost  can  be  written  as 

q 

Y  (l  -  =  0 Ukti  =  6fcli)  (4.45) 

i=  1 


Recall  that  the  bk  input  does  not  affect  the  state  evolution;  it  only  affects  the  cost.  We  can  therefore  compute 
the  optimal  choice  of  bk,  denoted  as  using  an  optimization  minimizing  the  tracking  error  over  a 
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single  time  step.  We  can  thus  write 


Pk(h)  =  arg  min  E 

b 


YAi-  1 rkjbki  =  OIL  6,:  = 


_i= 1 


h 


we  can  write  the  total  cost  for  time  step  k  as 

/  n  q 

g(bk,h )  =lbfe  +  tI  Yclrk,e  =  0  +  Y  (-1-  “  lrkIk,i  =  0 ^PUlk)  =  h,i) 


£=  1 


1=1 


(4.46) 


(4.47) 


Since  g  is  bounded  by  (cn  +  q)  and  the  expected  time  till  the  object  leaves  the  network  is  finite,  the  cost 
function  J  is  well-defined. 

The  evolution  of  pk  is  difficult  to  write  mathematically,  but  it  is  a  standard  nonlinear  filtering  operation. 
One  example  of  a  distribution  update  is  shown  in  Figure  4.8. 


(a)  (b) 


uk,2 

(c) 


Figure  4.8:  An  example  of  a  distribution  update  for  q  =  2  and  n  =  9.  In  each  subfigure,  a  joint  distribution 
for  the  objects  is  shown.  In  (a),  it  is  known  that  one  object  is  located  at  position  3  and  one  is  located 
at  position  6.  In  (b),  the  joint  distribution  at  time  k  +  1  before  incorporating  observations  is  shown.  In 
generating  (c),  we  suppose  that  sensors  1,  5,  6,  and  8  are  awake  and  have  failed  to  observe  the  object.  The 
distribution  in  (c)  is  the  one  that  results  from  incorporating  these  observations. 

For  notational  convenience,  we  also  define 

Pk.i(b)  =  Y  Pkib')  (4.48) 

b'-.b'=b 

In  other  words,  pk,i  is  the  marginal  distribution  for  object  i.  Each  component  of  the  vector  valued  function 
31  can  be  chosen  according  to 


rk)  =  arg  max  -b  =  0 Pk,i(b)  (4.49) 

b 

In  other  words,  for  each  object  we  select  the  estimated  object  location  from  among  the  locations  where  a 
sensor  is  awake.  From  these  locations,  we  select  the  one  with  the  largest  value  of  the  marginal  distribution 
for  that  object.  Note  that  31  has  the  same  form  for  every  k  so  we  can  drop  the  subscript  and  refer  to  31 
as  (3*.  Our  approach  to  generating  suboptimal  solutions  is  similar  to  that  used  earlier.  Namely,  we  make 
unrealistic  assumptions  that  greatly  simplify  the  evolution  of  pk-  These  assumptions  also  allow  the  tracking 
and  energy  costs  to  be  written  as  a  sum  of  costs,  one  for  each  sensor.  The  result  is  that  the  problem  then 
separates  into  a  set  of  n  simpler  subproblems,  one  for  each  sensor,  that  can  be  more  easily  solved. 


92 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


The  Qmdp  policy  is  obtained  as  a  solution  of  the  per-sensor  Bellman  equation  for  sensor  £ 

U 

b^T  \j= 1 

Note  that  this  assumption  implies  that  there  will  be  no  future  data  association  errors  and  thus  the  only 
tracking  costs  present  in  designing  this  policy  arc  observation  errors. 

We  also  define  an  FCR  policy  that  is  to  select  the  first  value  of  u  such  that 


q 

E 

1=1 


y  Mi  =  £  +  c(PPu+1)(b )  +  (PPu+1)(b)  jW(6b 


(4.50) 


EE  C pPu+1)(b )  >  E  E  ~(pPu+1)(b)  (4.51) 

i=  1  b\bi=l  i= 1  b:bi^n+ 1 


The  lower  bound  that  results  from  the  QMDP  policy  is  likely  to  be  loose  when  data  association  errors 
dominate  the  tracking  cost.  Hence,  we  designed  a  QMDP  -like  policy  that,  instead  of  assuming  the  state  is 
known,  assumes  that 


•  at  the  current  time  step,  after  selecting  sleep  times  all  sleeping  sensors  will  be  allowed  to  make  obser¬ 
vations  (with  no  energy  cost),  and 

•  at  future  time  steps,  the  distribution  for  the  object  location  will  evolve  as  if  all  sensors  arc  awake. 


This  gives  rise  to  the  term  All  Awake  (AA  policy).  Note  that  the  AA  assumption  is  like  assuming  we  will 
have  “perfect  observations.”  However,  this  does  not  imply  perfect  knowledge  of  the  state  due  to  the  presence 
of  data  association  errors.  Note  that  since  we  arc  assuming  more  information  than  is  actually  available,  the 
AA  assumption  does  yield  a  lower  bound  on  optimal  performance.  Due  to  complexity  issues,  we  only 
designed  the  AA  policy  for  the  q  =  2  case. 

The  advantage  of  the  AA  assumption  is  that  it  allows  us  to  considerably  simplify  the  state  space  for  p k. 
Since  all  the  sensors  come  awake  at  each  time  step,  the  set  of  at  most  two  locations  where  an  object  could 
be  present  is  known  exactly.  Suppose  for  the  moment  that  there  arc  two  distinct  locations  where  an  object 
is  observed.  Let  bk  =  (bky  1,^,2)  with  ^k,\  <  2  being  the  locations  where  objects  are  present  at  time 

k.  Thus,  l>k  belongs  to  a  subset  of  {1 .... .  n  +  l}2.  To  completely  characterize  py.  we  need  only  specify 
the  probability  that  by.  =  by..  Denote  this  probability  as  dy ..  Then  with  probability  1  —  dy-  we  have  that 
by.  =  (by-  2.  by.  \  ).  Note  that  if  there  is  only  one  distinct  location  where  an  object  is  observed  we  can  simply 
let  bk,  1  =  bk, 2  and  dk  =  1. 

The  state  space  for  xy-  =  (by..  dk)  is  not  finite  due  to  dy .  <E  [0,1].  The  approach  we  take  is  to  quantize  dy- 
and  construct  a  kernel  P  for  this  quantized  version  of  xy..  Note  that  in  doing  this  we  no  longer  have  a  true 
lower  bound;  however,  with  fine  enough  quantization  we  can  well  approximate  such  a  lower  bound. 

We  define  functions  Tu  and  Ts  such  that  Tw (xk,£)  and  Ts(xk,£)  are  the  tracking  costs  incurred  by 
sensor  £  when  it  is  awake  and  asleep,  respectively.  We  showed  that  the  per-sensor  Bellman  equation  for 
sensor  £  under  the  above  assumptions  is  given  by 


JW(P) 


min  > 

u  ^ J 


x^T 


y(pPJ)(x)Ts(x,£)  +  (pPu+1)(x)Tw(x,£)  +  c(pPu+1)(x)  +  (pPu+1)(x)  J^(4) 
3= 1 

(4.52) 


This  equation  can  be  solved  through  the  use  of  policy  iteration  to  yield  a  policy  and  a  lower  bound. 

1.3.1.  Numerical  Results 

In  this  section,  we  give  some  simulation  results  that  illustrate  the  performance  of  the  policies  we  derived  in 
previous  sections. 

We  first  consider  a  network  we  term  Network  A.  This  network  is  a  one-dimensional  network  with  seven 
sensors.  The  small  number  of  sensors  was  needed  because  the  AA  policy  must  perform  policy  iteration 
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for  a  number  of  states  equal  to  "f"2  1  j  A  +  n  +  1  where  A  is  the  number  quantization  levels  for  dp..  In  our 
simulations,  we  used  A  =  21  for  a  total  of  596  states.  Policy  iteration  for  this  number  of  states  required 
significant  computation  and  the  network  could  not  be  made  much  larger.  A  value  of  umax  =  50  was  used  in 
computing  the  AA  policy.  The  object  movement  in  Network  A  is  parametrized  by  a  scalar  a  G  [0, 1],  Object 
1  moves  one  cell  to  the  left  with  probability  a  and  one  cell  to  the  right  with  probability  1  —  a  in  each  time 
step.  Object  2  does  just  the  opposite,  moving  one  cell  to  the  left  with  probability  1  —  a  and  one  cell  to  the 
right  with  probability  a.  Note  that  the  closer  a  is  to  0.5,  the  more  difficult  it  is  to  distinguish  between  the 
objects  based  on  their  movements.  This  means  that  by  varying  a  we  can  investigate  the  performance  of  our 
policies  for  various  amounts  of  data  association  error. 

We  illustrate  the  performance  of  our  policies  for  Network  A  for  the  cases  a  =  0.55,  a  =  0.75,  and 
a  =  0.95  in  Figure  4.9.  Curves  are  shown  for  the  QMDP  and  AA  lower  bounds  as  well  as  the  QMDP  ,  FCR, 


Sensors  Awake  per  Unit  Time  Sensors  Awake  per  Unit  Time  Sensors  Awake  per  Unit  Time 


(a)  a  =  0.55  (b)  a  =  0.75  (c)  a  =  0.95 

Figure  4.9:  Performance  of  QMDP  ,  FCR,  and  AA  policies  for  Network  A  for  various  values  of  a. 

and  AA  policies.  The  curves  are  tradeoff  curves  that  examine  the  tradeoff  between  energy  cost  and  tracking 
cost  as  the  parameter  c  is  varied.  In  examining  the  tradeoff  curves,  the  distance  from  the  right-hand  point 
of  each  curve  to  the  x-axis  is  the  average  number  of  data  association  errors  when  all  sensors  are  awake.  No 
tracking  error  smaller  than  this  can  be  achieved.  From  the  figures  we  can  draw  the  following  conclusions: 

•  The  lower  bound  due  to  the  QMdp  assumption  is  tight  when  only  a  few  sensors  are  awake  (large  c). 
This  is  because  the  QMDP  assumption  incorporates  only  observation  errors  and  when  few  sensors  are 
making  observations,  observation  errors  dominate. 

•  The  lower  bound  due  to  the  AA  assumption  is  tight  only  when  many  sensors  are  awake  (small  c).  This 
is  because  the  tracking  cost  approximation  used  in  computing  the  AA  policy  is  loose  when  neither 
object  is  observed. 

•  The  Q mop  policy  performs  best  when  only  a  few  sensors  are  awake,  and  the  AA  policy  performs  best 
when  many  sensors  are  awake.  Not  surprisingly,  these  are  the  same  regions  where  their  bounds  are 
tight. 

•  The  FCR  policy  is  the  worst-performing  policy.  The  difference  between  the  FCR  policy  and  the  other 
policies,  while  never  especially  large  in  terms  of  the  tradeoff  curves,  shrinks  as  data  association  errors 
become  small. 

We  now  turn  our  attention  to  simulating  the  FCR  policy,  which  scales  better  than  the  other  policies,  for 
a  larger  network,  termed  Network  B.  Network  B  is  a  one-dimensional  network  with  41  sensors.  Note  that 
policy  iteration  for  the  QMDP  policy  would  need  to  be  performed  over  (n  +  l)2  =  1764  states  and  the 
requirements  for  the  AA  policy  would  be  even  larger.  The  distributions  for  the  movement  of  the  objects 
are  given  in  Table  4.3  and  illustrated  graphically  in  Figure  4.10(a).  Since  no  lower  bounds  are  available 
for  Network  B,  we  compare  the  performance  of  our  FCR  policy  to  a  duty  cycle  policy,  where  each  sensor 
comes  awake  with  some  fixed  probability  at  each  time  step.  Figure  4.10(b)  shows  tradeoff  curves  for  these 
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Table  4.3:  Object  movement  distributions  for  Network  B. 


Change  in  Position 

-2 

-1 

0 

+  1 

+2 

Probability  for  Object  1 

0.2482 

0.0568 

0.3205 

0.2633 

0.1112 

Probability  for  Object  2 

0.1641 

0.3395 

0.1566 

0.0633 

0.2765 

Movement  per  Unit  Time 
Object  2 


Movement  per  Unit  Time 


(a)  Object  movement  distributions  for  Network  B  (b)  Tradeoff  curves  for  FCR  and  duty  cycle  policies  for  Net¬ 
work  B 


Figure  4.10:  Tradeoff  curves  and  object  movement  distribution  for  Network  B. 


two  policies.  The  tradeoff  curve  for  the  duty  cycle  policy  is  generated  by  varying  the  probability  that  a 
sensor  is  awake  between  0  and  1.  The  FCR  tradeoff  curve  significantly  outperforms  the  duty  cycle  policy. 

2.  Sensor  Scheduling 

Flere,  we  consider  a  scheduling  variant  of  the  problem  which  can  be  thought  of  as  a  sleeping  problem  with 
an  external  wake-up  mechanism,  i.e.,  sensors  can  be  woken  up  by  external  means  (e.g.  a  low-power  wake- 
up  radio).  At  time  k,  the  permissible  control  actions  for  an  n-sensor  scheduling  problem  are  /(-dimensional 
binary  vectors,  i.e.,  vectors  in  {0,  l}n  (corresponding  to  the  set  of  sensor  nodes  to  activate  at  each  time 
step),  in  contrast  to  vectors  in  N0  for  the  sleeping  problem  (corresponding  to  the  sleep  durations  of 
awake  sensors),  where  No  is  the  set  of  non-negative  integers  and  na{k)  the  number  of  awake  sensors  at  time 
k.  The  simpler  structure  of  the  control  space  for  the  scheduling  problem  does  not  address  the  combinatorial 
nature  of  the  control  space,  yet  it  enables  efficient  approximate  solution  methodologies  [8]. 

Again,  we  adopt  a  bottom-up  approach  where  we  consider  a  range  of  sensing,  motion  and  cost  models 
with  increasing  levels  of  difficulty  and  devise  suboptimal  scheduling  policies  to  balance  the  tradeoff  between 
energy  expenditure  and  tracking  performance.  In  some  cases  we  are  also  able  to  derive  lower  bounds  on  the 
optimal  energy-tracking  tradeoff.  In  addition  to  the  simple  sensing  model  of  Section  1.1,  we  also  consider 
more  generalized  models. 

2.1,  Overlapping  Sensors  with  Discrete  Observations  Models 

In  this  model,  we  continue  to  use  a  discrete  model  for  the  target  transition  but  we  redefine  a  new  sensing 
model  and  cost  structure  to  account  for  the  fact  that  sensors  could  have  overlapping  visibility  regions.  Within 
that  model  we  further  consider  simple  and  probabilistic  sensing.  Simple  sensing  refers  to  the  case  where  the 
target  is  perfectly  observed  within  the  visibility  region  of  any  active  sensor.  Therefore,  a  tracking  error  is 
incurred  if  none  of  the  sensors  observing  the  current  target  location  is  active.  Redefining  the  cost  structure 
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for  this  model: 


g(bk,uk_ i)  =  1  bk  /  r  =  0.  Vj  <E  Bbk  +  ^  ctuk_he  =  1  j  (4.53) 

By  probabilistic  sensing  we  account  for  observation  uncertainty  even  if  the  target  is  within  the  visibility 
region  of  one  or  more  active  sensors.  In  particular,  we  assume  that  the  observation  is  uniformly  distributed 
over  a  location  set  lZk  (other  than  the  true  target  location)  that  belong  to  the  visibility  regions  of  the  set 
of  awake  sensors  monitoring  the  true  location  bk.  The  number  of  these  locations  is  function  of  the  control 
uk-  |  and  the  object  state  bk.  If  the  true  target  location  does  not  belong  to  the  visibility  region  of  an  awake 
sensor,  we  naturally  exclude  the  visibility  region  of  that  sensor  since  no  measurement  is  received  from  such 
a  sensor.  When  TZk  is  a  singleton  {bk},  we  set  q  =  1.  A  tracking  error  is  incurred  if  the  target  is  not  directly 
observed  and  the  uncertainty  in  the  target  location  cannot  be  resolved. 

2.2.  Continuous  Observation,  Continuous  State  and  Arbitrary  Cost  Models 

In  this  class  of  models,  we  allow  for  an  arbitrary  distribution  of  the  observations  given  the  current  object  lo¬ 
cation.  Tracking  cost  is  modeled  through  an  arbitrary  distance  measure  between  the  actual  and  the  estimated 
object  location.  If  we  denote  the  set  of  possible  object  locations  £>,  we  have  B  =  rn  +  1.  In  contrast  to  the 
simplistic  model,  m  is  different  from  n  since  object  locations  arc  arbitrary  and  we  no  longer  assume  one 
location  corresponds  to  the  sensing  range  of  one  particular  sensor.  The  (m  +  l)-th  state  again  corresponds 
to  a  termination  state.  Furthermore,  the  target  can  be  moving  on  a  continuous  state  space  in  which  case  m 
is  oo. 

For  simplicity  of  exposition,  we  focus  on  discrete  state  spaces.  Also,  we  omit  indexing  time  whenever 
the  time  evolution  is  well-understood  to  avoid  cumbersome  notation.  We  consider  the  following  observation 
model  for  illustration;  however,  our  approach  is  fairly  general: 

p(s\b,u)  =  n{^exP  (^si  ~  +  ^  j  =  1  +  b(si  -  e)lui  =  oj  (4.54) 

where  s  is  an  n  x  1  continuous  observation  vector  with  the  z-th  entry,  st,  representing  the  observation  of 
sensor  z,  pi,i  =  1, . . . ,  n,  is  the  position  of  the  z-th  sensor,  b  is  the  target  state,  V  is  some  positive  constant, 
e  stands  for  erasure,  and  <5(.)  is  the  Dirac  Delta  function.  In  (4.54),  the  observation  of  an  active  sensor  is 
Gaussian  with  a  mean  received  signal  strength  inversely  proportional  to  the  square  of  the  distance  between 
the  sensor  and  the  actual  target  location.  The  observation  of  an  inactive  sensor  is  just  an  erasure. 

As  for  the  sleeping  problem  we  define  the  tracking  error  through  an  arbitrary  bounded  distance  function 
d(b,  b)  between  the  actual  and  the  estimated  object  locations,  which  can  be  the  Hamming  distance  or  the 
Euclidean  distance  for  discrete  and  continuous  state  spaces,  respectively. 

2.2.1.  Approximate  Scheduling  Policies 

There  are  a  number  of  algorithms  for  solving  POMDPs  exactly  [31,  73,  156].  These  algorithms  rely  on  the 
powerful  result  of  Sondik  that  the  optimal  value  function  for  any  POMDP  can  be  approximated  arbitrarily 
closely  using  a  set  of  hyper-planes  (a- vectors)  defined  over  the  belief  simplex  [156].  The  result  is  a  value 
function  parameterized  by  a  number  of  hyperplanes  (or  vectors)  whereby  the  belief  space  is  partitioned  into 
a  finite  number  of  regions.  Each  vector  minimizes  the  value  function  over  a  certain  region  of  the  belief  space 
and  has  a  control  action  associated  with  it,  which  is  the  optimal  control  for  the  beliefs  in  its  region. 

To  clarify,  in  value  iteration  we  generally  start  with  some  initial  estimate  for  J*  and  repeatedly  apply  the 
transformation  defined  by  the  right  hand  side  of  the  Bellman  equation  until  the  sequence  of  cost  functions 
converges.  Let  {a^}^  ^  denote  the  set  of  vectors  parameterizing  the  value  function  J*k>  after  k  iterations, 
where  |  J^\  is  the  total  number  of  hyperplanes,  and  a\k\b),  which  is  a  hyperplane  in  the  belief  space, 
represents  the  value  of  executing  the  fe-step  policy  associated  with  the  z-th  vector  starting  from  a  state  b. 
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Hence,  the  value  of  executing  the  i-th  hyperplane  policy  starting  from  a  belief  state  p  is  simply  the  dot 
product  of  a.i  and  p: 

4k\p)  =  =  P  ' 

b 

Therefore,  the  value  of  the  optimal  fc-step  policy  stalling  at  p  is  simply  the  minimum  dot  product  over 
all  hyperplanes,  i.e., 

J*^(p)  =  min  p  ■  atjk\ 

Hence,  J*^k\p)  is  piecewise  linear  and  concave.  Some  of  the  vectors  (also  known  as  policy  trees)  may 
be  dominated  by  others  in  the  sense  that  they  arc  not  optimal  at  any  region  in  the  belief  simplex.  Thus,  many 
exact  algorithms  devise  pruning  mechanisms  whereby  a  parsimonious  representation  with  a  minimal  set  of 
non-dominated  hyperplanes  is  maintained. 

Even  though  the  aforementioned  linearity/concavity  property  makes  the  policy  search  a  great  deal  sim¬ 
pler,  the  exact  computation  is  generally  intractable  except  for  relatively  small  problems.  The  two  major 
difficulties  for  exact  computation  arise  from  the  exponential  growth  of  the  vectors  with  the  planning  horizon 
and  with  the  number  of  observations,  and  the  inefficiencies  related  to  identification  of  such  vectors  and  sub¬ 
sequently  pruning  them.  Namely,  the  number  of  hyperplanes  grows  double  exponentially  such  that  after  k 
steps  the  number  of  hyperplanes  is  O  ,  where  \U\  and  |5|  denote  the  cardinality  of  the  control  and 

observation  spaces,  respectively.  Equivalently,  the  number  of  hyperplanes  per  iteration  grows  as: 


|j(fc+!)|  =0 


\S\ 


This  has  led  to  a  number  of  approximations  and  suboptimal  solutions  techniques  that  trade  off  solution 
quality  for  speed. 


Remark  2.1.  The  intractability  of  the  optimal  solution  for  our  problem  is  primarily  due  to  the  following 
reasons: 


(i)  The  cost  function  is  minimized  over  the  simplex  of  probability  distributions,  i.e.,  the  (m—  1 ) -dimensional 
belief  simplex  for  m-state  discrete  state-space  models,  and  the  space  of  probability  density  functions 
for  continuous  state-space  models. 

(ii)  The  exponential  explosion  of  the  action  space  with  the  number  of  sensors  (2"  actions). 

(iii)  The  exponential  growth  of  the  a- vectors  with  the  planning  horizon  and  with  the  number  of  observa¬ 
tions,  especially  for  continuous  observation  models. 

In  addition  to  the  QMDP  strategy,  we  also  develop  sensor  scheduling  strategies  based  on  point-based 
approximations.  Despite  the  fact  that  the  generated  QMDP  based  policies  perform  reasonably  well,  generally 
the  resulting  policies  would  not  take  actions  to  gain  information  (an  effect  of  the  observable-after-control 
assumption),  leading  to  situations  wherein  the  belief  state  does  not  get  updated  appropriately.  Furthermore, 
while  decoupling  the  scheduling  problem  provides  close-to  optimal  performance  for  uncoupled  or  lightly- 
coupled  sensing  and  tracking  models  (see  Section  3),  it  might  come  at  the  expense  of  reduction  in  solution 
quality  for  more  realistic  or  heavily-coupled  models.  While  our  previous  approach  reduced  complexity 
via  decoupling  and  learning,  the  key  idea  here  is  to  optimize  the  value  function  only  for  a  small  set  of 
reachable  beliefs  V  and  not  over  the  entire  belief  simplex.  Developing  a  class  of  point-based  algorithms, 
which  mostly  differ  in  the  way  the  subset  of  belief  points  is  chosen  and  the  execution  order  of  the  backup 
operations  over  the  selected  belief  points,  has  been  the  focus  of  recent  algorithm-development  research 
targeting  large  scale  POMDPs.  These  algorithms  were  designed  to  deal  with  large  state  spaces,  yet,  two  extra 
difficulties  in  the  scheduling  problem  arise  from  the  size  of  the  action  space  (which  is  2"  for  all  models) 
and  the  observation  space  (for  the  models  in  Sections  2.2).  Regarding  the  dimensionality  of  the  action 
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Figure  4.11:  Structure  of  the  point-based  scheduling  approximation. 


space,  we  devise  a  strategy  to  sample  actions  based  on  the  support  of  the  beliefs  and  the  sparse  structure 
of  the  transition  models.  Intuitively  speaking,  an  object  can  only  move  from  one  side  of  the  network  to 
the  other  side  within  time  constraints  rendering  exponentially  many  scheduling  actions  irrational  at  certain 
times.  Hence,  instead  of  performing  full  updates  including  2n  actions,  we  perform  the  minimization  over 
a  reduced  control  space  U(p)  for  every  p  e  V.  When  dealing  with  continuous  or  large  observations,  we 
combine  that  with  a  methodology  that  aggregates  observations  and  uses  aggregate  observations  for  value 
iteration  updates.  At  the  core  of  the  algorithm  we  use  Perseus  [157],  a  valiant  of  PBVI  [122],  whereby 
value  iteration  updates  are  not  carried  out  for  every  sampled  belief.  Instead,  the  values  for  many  belief 
points  arc  improved  simultaneously  in  one  update.  Figure  4.11  depicts  the  structure  of  our  point-based 
approximation,  combining  control  space  reduction  and  observation  aggregation  with  point-based  updates. 
Earlier,  we  described  QMDP  based  policies,  whereby  issues  (i)  and  (iii)  in  Remark  2.1  are  resolved  since 
we  only  needed  to  solve  the  underlying  Markov  Decision  Process  to  describe  the  full  approximate  surrogate 
value  function.  Decoupling  the  problem  into  one-per-sensor  subproblems  (naturally  or  artificially)  further 
enabled  us  to  address  issue  (ii).  Yet,  the  resulting  scheduling  policies  generally  do  not  take  control  actions 
to  gain  information. 

Instead  of  reducing  complexity  via  artificial  decoupling  and  learning,  the  key  idea  of  point-based  ap¬ 
proximate  policies  is  to  optimize  the  value  function  only  for  specific  reachable  sampled  beliefs  and  not  over 
the  entire  belief  simplex  (addressing  issue  (i)  in  Remark  2.1).  Due  to  the  large  size  of  the  control  space, 
we  also  devise  strategies  to  sample  actions  exploiting  the  sparsity  of  the  beliefs  and  the  problem  structure 
(to  address  issue  (ii)).  Moreover,  observation  aggregation  is  used  for  continuous  observation  models.  Fur¬ 
thermore,  since  Perseus  updates  arc  not  carried  out  for  every  sampled  belief  and  multiple  belief  points  arc 
improved  simultaneously,  the  number  of  a  vectors  grows  modestly  with  the  number  of  iterations  addressing 
issue  (iii)  in  Remark  2.1. 

Figure  4.12  illustrates  the  progress  of  one  iteration  of  Perseus.  The  x-axis  represents  the  belief  space 
with  circles  representing  the  sampled  belief  set  V  =  {p\ . . . . .  p- } .  The  y-axis  is  the  value  function  at 
consecutive  iterations,  i.e.  -J^k>  (solid  lines)  and  j(fc+1)  (dashed  lines).  The  figure  displays  the  a  vectors 
and  different  steps  illustrating  the  progress  of  the  algorithm.  The  algorithm  selects  a  belief  point  at  random 
and  updates  the  value  function  for  that  belief.  Then  a  new  update  is  carried  out  for  a  belief  point  randomly 
selected  from  the  set  of  remaining  beliefs,  i.e.,  beliefs  which  did  not  improve  in  the  previous  step.  The 
algorithm  repeats  till  all  belief  points  are  updated.  Solid  lines  represent  the  hyperplanes  in  the  k- th  iteration 
and  dashed  lines  represent  the  newly  added  hyperplanes  during  the  (k  +  l)-th  iteration. 

2.2.2.  Sampling  Actions  Based  on  the  Support  of  the  Belief 

Note  that  the  DP  update  equation  involves  a  minimization  over  all  control  actions  in  \U\.  Even  though  one 
iteration  of  the  algorithm  is  linear  in  the  cardinality  \U\  of  the  control  space,  \U\  itself  is  exponential  in  the 
number  of  sensors,  thus  rendering  the  minimization  infeasible  for  a  relatively  large  sensor  network. 

The  idea  here  is  to  exploit  the  structure  of  the  scheduling/tracking  problem.  Since  the  target  transition 
model  is  naturally  sparse,  we  predict  relatively  small  uncertainty  regions  for  the  target  state  at  future  time 
steps.  More  specifically,  for  every  belief  point  in  V,  we  use  prior  information  about  the  target  transition 
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Figure  4.12:  One  iteration  of  Perseus  illustrating  the  progress  of  the  algorithm.  The  x-axis  represents  the 
belief  space  with  circles  representing  the  sampled  belief  set  V  =  {p\  . . . . .  7)7 } .  The  y-axis  is  the  value 
function  at  consecutive  iterations,  i.e.  J{k]  and  +  ]  ) .  Solid  lines  represent  the  hyperplanes  in  the  /,:-th 
iteration  and  dashed  lines  represent  the  newly  added  hyperplanes  during  the  (k  +  l)-th  iteration,  (a)  The 
initial  value  function  ./^;  (b)  pi  is  randomly  selected  and  a  new  a  vector  is  added  to  This  update 

step  only  happens  to  improve  p\.  Dark  circles  represent  belief  points  which  did  not  yet  improve;  (c)  ps  is 
sampled  and  a  new  hyperplane  is  added  which  improves  the  value  for  p2  through  pc, ;  (d)  Only  p~  did  not 
improve,  thus  P7  is  sampled  and  a  new  hyperplane  is  added  to  j(fc+1) ;  (e)  All  belief  points  improved, 
is  computed,  the  iteration  ends. 


model  to  project  the  future  state  of  the  target.  This  is  particularly  useful  when  the  current  belief  vector  is 
sparse  leading  to  more  restricted  uncertainty  regions.  Subsequently,  we  restrict  our  attention  to  a  significant 
subset  of  sensors,  that  is,  the  sensors  of  relevance  to  the  particulars  of  the  uncertainty  region.  Flence,  we  only 
consider  scheduling  actions  involving  scheduling  different  combinations  of  a  reduced  number  of  sensors 
which  considerably  reduces  the  control  space  for  every  belief  in  V.  If  the  number  of  significant  sensors  is 
still  large,  we  randomly  sample  actions  from  the  reduced  control  space.  Note  that  the  same  intuition  extends 
to  more  complex  motion  models  wherein  information  about  target  speed,  maneuver,  and  acceleration  can 
be  factored  in  to  define  the  future  uncertainty  regions.  Hence,  instead  of  performing  full  updates  including 
2n  actions,  we  perform  the  minimization  over  a  reduced  control  space  for  every  p  £  V.  Specifically,  we 
redefine  the  point  update  equation  as:  a  =  arg mirij^pj  ^  ^  p  ■  aZ.  where  Uip)  designates  the  reduced 
control  space  for  the  belief  vector  p.  It  is  worth  mentioning  that  the  observation  and  the  cost  models  need 
to  be  computed  on  the  fly  for  each  sampled  control  action  during  the  algorithm  implementation. 

2.2.3.  Observation  Aggregation 

The  point  update  equation  involves  back-projecting  all  hyperplanes  in  the  current  iteration  one  step  from  the 
future  and  returning  the  vector  that  minimizes  the  value  of  the  belief.  Since  this  involves  computing  a  cross 
sum  by  enumerating  all  possible  combinations  of  alpha  vectors  for  the  different  observations,  a  number  of 
vectors  which  is  exponential  in  the  number  of  the  observations  is  generated  at  each  stage.  The  recursion  has 
to  be  redefined  to  address  continuous  observation  models.  It  is  not  hard  to  see  that  if  different  observations 
map  to  the  same  minimizing  hyperplane,  then  they  can  be  aggregated.  Hence,  if  we  can  partition  the  obser¬ 
vation  space  into  regions  that  map  to  the  same  hyperplane  (possibly  non  contiguous),  the  continuous  model 
is  reduced  to  a  corresponding  discrete  model.  Integration  is  replaced  by  a  summation  over  these  partitions 
and  the  weighing  probabilities  arc  obtained  by  integrating  the  conditional  density  over  these  partitions.  This 
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is  clarified  in  the  following: 

f  mm'S~'p(s\u,b')\2p(b'\b)p(b)ai(b')  ds  =  y^[pP]yF[«Sj|M,  b']otj(b').  (4.55) 

J s  v  b  j  b’ 

To  find  the  regions  S:I  of  aggregate  observations,  we  need  to  solve  for  the  boundaries,  i.e.,  for  each  pair 
(i,  j)  of  a  vectors  we  need  to  solve  for  s: 


Ot-i  ■  <j>(p,  u,  s)  =  Oij  ■  4>{p,  u,  s ) 

where  <j>{p,  u,  s)  =  p^b1)  oc  ^hp{b)p(s\b' ,u)p(b'\b) 

Hence,  we  need  to  solve: 

J](aJ(6,)-ai(h/))[P^]b'exp|-^  ~  (y  _^)2  +  1)2}  =0 

After  solving  for  the  boundaries,  we  can  readily  define  the  regions: 


(4.56) 


(4.57) 


Sj*  =  {s|j*  =  arg  max  ay  •  (p(p,u,s)}  (4.58) 

j 

Now  the  update  step  is  simply: 

Jip)  =  9(P,  u*)  +  Y^\pP]b>nSj\u*,b'}aj(b')  (4.59) 

j  b' 

where 

P  [<Sj  |  -ix* ,  67]  =  [  p(s\u\b')ds. 

J  sESj 

Finding  a  closed  form  analytical  solution  for  (4.57)  is  not  feasible.  Instead,  we  use  Monte-Carlo  simulations 
to  solve  for  the  boundaries  and  get  estimates  of  the  weighing  probabilities  by  sampling  observations  from 
p(s\u,  b')  for  different  combinations  of  actions  and  target  states. 

Akin  to  the  sleeping  problem,  we  are  able  to  derive  lower  bounds  on  the  energy-tracking  tradeoff  for  the 
simple  as  well  as  the  continuous  Gaussian  observation  models.  For  the  simple  model  the  QMDP  value  func¬ 
tion  is  itself  a  lower  bound  on  the  expected  total  cost  since  more  information  is  available  to  the  controller  at 
future  time  steps  given  the  reduced  uncertainty  assumption.  To  obtain  a  lower  bound  on  the  optimal  energy¬ 
tracking  tradeoff  for  such  models,  we  combine  the  observable-after-control  assumption  with  a  decomposable 
lower  bound  on  the  tracking  cost  as  we  presented  for  the  sleeping  problem. 

2.2.4.  Results 

In  this  section,  we  show  experimental  results  illustrating  the  performance  of  the  proposed  scheduling  policies 
for  the  different  models  considered  in  this  paper.For  the  planning  phase  in  case  of  point-based  policies, 
beliefs  are  sampled  by  simulating  multiple  object  trajectories  through  the  sensor  network.  Each  trajectory 
starts  from  a  random  state  sampled  from  the  initial  belief,  picking  actions  at  random,  until  the  target  leaves 
the  network. 

First,  we  consider  the  simple  model  with  a  linear  network  of  41  sensors.  The  object  can  move  anywhere 
from  three  steps  to  the  left  to  three  steps  to  the  right  in  each  time  step.  The  distribution  for  these  movements 
is  given  in  Table  4.4.  Figure  4.13(a)  shows  the  tradeoff  curve  between  the  number  of  active  sensors  per 
unit  time  and  the  tracking  error  per  unit  time  using  the  point-based  and  the  QMDP  policies.  The  figure  also 
shows  a  lower  bound  on  the  optimal  performance.  It  is  clear  that  both  policies  lead  to  tradeoffs  that  closely 
approach  the  lower  bound.  The  QMDP  policy  gets  even  closer  to  the  lower  bound  at  small  tracking  errors 
since  the  observable-after-control  assumption  is  more  meaningful  in  this  regime.  In  Figure  4.13(b)  we  show 
convergence  results  for  the  point-based  algorithm  with  reduced  control  space  minimization.  The  top  left 
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Table  4.4:  Object  movement  for  a  network  of  41  sensors  with  simple  cost  and  sensing  models. 


Change  in  Position 

-3 

-2 

-1 

0 

1 

2 

3 

Probability 

0.23 

0.10 

0.01 

0.33 

0.06 

0.05 

0.22 

(a)  Energy-Tracking  tradeoff  for  a  one-dimensional  network  of  (b)  Convergence  results  for  the  point-based  algorithm  for  a  one- 
41  sensors  with  the  simplistic  sensing  and  cost  model.  dimensional  network  of  41  sensors  with  the  simplistic  sensing 

and  cost  model. 


Figure  4.13:  Tradeoffs  and  convergence  for  a  1-D  networks  of  41  sensors  with  the  simple  sensing  and  cost 
model. 


subplot  displays  the  convergence  of  the  sum  cost  of  all  the  belief  points  in  V\  the  top  right  shows  the  expected 
cost  averaged  over  many  trajectories;  the  bottom  left  subplot  shows  the  number  of  hyperplanes  constituting 
the  value  function  as  a  function  of  time;  the  bottom  right  subplot  shows  the  number  of  policy  changes  versus 
time,  i.e.,  the  number  of  belief  points  for  which  the  optimal  action  changed  over  2  consecutive  iterations 
of  the  algorithm.  Figure  4.15  displays  the  average  cost  and  the  tradeoff  curves  for  the  network  in  Figure 
4.14  with  a  probabilistic  observation  model.  The  cost  per  unit  time  is  the  average  ratio  of  the  total  energy 
plus  tracking  cost  and  the  time  the  object  spends  in  the  network  before  reaching  the  termination  state. 
The  network  is  composed  of  12  sensors  and  20  object  locations  with  the  shown  connectivity  such  that  the 
observation  range  for  the  different  sensors  overlap.  The  object  moves  according  to  a  random  walk  anywhere 
from  three  steps  to  the  left  to  three  steps  to  the  right  in  each  time  step.  The  distribution  of  these  movements 
is  given  in  Table  4.1.  For  the  locations  close  to  the  boundaries,  i.e.,  when  less  than  three  steps  are  available 
on  the  right  or  left,  the  remaining  probability  is  absorbed  in  the  transition  to  the  termination  state.  Since 
the  tracking  error  for  this  model  is  inherently  coupled  across  sensors,  the  global  point-based  policy  clearly 
outperforms  the  learning-based  QMDP  policy. 


Sensors 


Locations 


Figure  4.14:  A  sensor  network  with  overlapping  sensing  ranges  (12  sensors  and  20  object  locations).  An 
edge  connects  a  sensor  to  a  given  location  if  this  location  falls  within  the  sensing  range  of  that  sensor. 
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Figure  4.15:  Overlap  model 


Figure  4.16:  Continuous  observation  model:  (a)  Total 
tradeoff 


(b) 

versus  energy  cost  per  sensor,  (b)  Energy-tracking 


Next,  we  consider  a  network  of  10  sensors  where  object  locations  are  located  on  integers  from  1  to  21. 
The  observation  for  each  awake  sensor  is  continuous  and  Gaussian  as  in  (4.54)  with  V  =  10.  The  locations 
of  the  sensors  are  as  given  earlier  in  Table  4.2  and  the  object  moves  according  to  the  random  walk  defined 
in  Table  4.1.  For  every  object  state  and  every  scheduling  action  in  the  reduced  control  space,  we  sample 
50  observations  to  construct  estimates  of  the  weight  probabilities  and  compute  the  aggregate  observation 
boundaries.  Up  to  32  actions  are  sampled  from  the  reduced  control  space.  In  this  setup,  the  belief  set  consists 
of  500  sampled  belief  vectors  and  we  assume  a  Hamming  error  cost.  Figure  4.16  shows  the  performance 
of  the  different  policies  for  the  continuous  observation  model.  It  is  shown  that  the  point-based  scheduling 
policy  outperforms  the  QMDP  policy.  We  further  show  a  lower  bound  on  the  optimal  performance  tradeoff. 
The  lower  bound  is  loose  especially  in  the  high  tracking  error  regime  since  the  derived  bound  on  per-sensor 
tracking  errors  assumes  all  other  sensors  are  awake.  However,  we  can  exactly  compute  the  saturation  point 
for  the  optimal  scheduling  policy  since  every  policy  has  to  eventually  meet  the  all-asleep  performance  curve, 
shown  in  Figure  4.16a,  when  the  energy  cost  per  sensor  is  high.  At  that  point,  all  sensors  are  inactive  and 
hence  the  target  estimate  can  only  be  based  on  prior  information.  Our  results  are  not  restricted  to  1-D 
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Figure  4.17:  (Left)  2-D  network  with  20  sensors  (stars)  and  25  possible  object  locations  (squares).  (Right) 
Energy-Tracking  tradeoff  of  the  QMDP  and  point-based  scheduling  policies  for  a  2-D  network  with  contin¬ 
uous  observations  and  Hamming  cost. 


networks  but  easily  apply  to  2-D  networks.  Namely,  Figure  4.17  (right)  shows  the  energy-tracking  tradeoff 
of  the  Qmdp  and  point-based  policies  in  addition  to  a  lower  bound  on  optimal  performance  for  the  2-D 
network  of  Figure  4.17  (left)  with  continuous  observations  and  Hamming  cost.  The  entries  of  the  object 
transition  matrix  are  generated  randomly  with  the  restriction  that  at  any  state  the  object  can  only  move  to  its 
neighboring  locations  or  remain  at  its  current  state.  This  simulation  shows  similar  trends  to  the  previously 
observed  results.  The  point-based  policies  outperform  the  QMdp  approach  at  the  expense  of  an  increase 
in  the  offline  computational  complexity  of  the  planning  phase.  Furthermore,  the  lower  bound  is  reasonably 
tight  in  the  low  tracking  error  regime. 

2.3.  Scheduling  in  Clutter 

Here,  our  goal  is  to  design  a  central  controller  to  schedule  the  sensors  to  track  an  object  of  interest  in  the  pres¬ 
ence  of  false  alarms  (clutter)  [7].  Non-linear  filtering  for  tracking  in  cluttered  environments  is  particularly 
hard  as  it  requires  considering  a  large  number  of  events  due  to  the  so-called  data  association  problem,  and 
is  hence  computationally  intensive.  The  presence  of  random  interference  from  nearby  objects,  false  alarms, 
electromagnetic  interference  etc.  generally  leads  to  ambiguity  in  the  origin  of  the  sensor  measurements  and 
hence  it  is  crucial  to  associate  the  measurements  with  their  corresponding  tracks.  One  simple  and  intuitive 
candidate  solution  for  the  association  problem  is  to  choose  the  signal  with  the  highest  intensity,  among  a  set 
of  validated  measurements,  for  track  update  and  discard  the  others.  This  is  known  as  Strongest  Neighbor 
Filter  SNF.  The  Nearest  Neighbor  Filter  NNF  is  another  solution  that  uses  the  measurement  closest  to  the 
predicted  measurement  obtained  through  a  prediction  step  of  the  track  estimation  filter.  However,  these 
algorithms  start  to  fail  when  the  false  alarm  rate,  or  clutter  density,  increases.  Alternatively,  probabilistic 
data  association  (PDA)  for  a  single  target  in  clutter  is  another  approach  which  uses  all  the  validated  mea¬ 
surements  and  does  not  discard  any  of  them  [13].  A  proper  weight,  reflecting  the  association  probability,  is 
assigned  to  each  measurement  and  the  weighted  average  of  the  validated  innovations  is  used  for  the  update. 

While  most  of  the  existing  literature  on  target  tracking  in  clutter  has  focused  on  the  estimation  aspect  of 
the  tracking  problem  using  one  or  two  sensors,  the  primary  focus  of  our  work  is  on  the  design  of  efficient 
control  policies  organizing  the  activity  of  a  larger  network  of  sensors  in  the  presence  of  false  alarms.  We  cast 
the  scheduling  problem  as  a  Partially  Observable  Markov  Decision  Process  (POMDP),  and  devise  strategies 
whereby  the  sensors  are  activated  to  optimize  the  fundamental  tradeoff  between  energy  expenditure  and 
tracking  performance  in  the  presence  of  spurious  measurements  from  clutter. 

We  focus  on  the  design  of  scheduling  policies  rather  than  the  tracking  aspect  of  the  problem.  Following 


103 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


the  same  bottom-up  approach,  we  first  considered  a  simplistic  model  for  sensing  with  non-overlapping 
sensing  ranges.  This  assumption  is  then  relaxed  when  we  consider  sensors  with  overlapping  sensing  regions. 
Thus,  if  sk  is  the  measurement  vector  at  time  k  and  skg  the  observation  of  the  f-th  sensor,  then 


and 


/  i,  if'Ufc—i  bj,  — is 
\  0;  if  Uk-ltbk  0> 


(4.60) 


Sk,e 


1,  w.p.  PF  if  =  1; 

0,  if  Uk-i/  =  0;  W  /  bk 

o,  w.p.  i  -  pFif«fc-M  =  i; 


(4.61) 


The  clutter  density  is  captured  by  the  false  alarm  probability  Pp  that  an  active  sensor  provides  a  positive 
measurement.  Therefore,  clutter  leads  to  uncertainty  into  the  origin  of  the  measurements  which  could  even¬ 
tually  lead  to  loss  in  tracking  performance.  Proper  countermeasures  should  take  that  into  consideration  when 
designing  a  sensor  scheduling  policy. 

2.3.1.  Overlapping  Sensing  Regions 

In  this  model,  we  allow  the  sensing  regions  to  overlap.  An  example  of  this  model  is  illustrated  in  Figure  4.18 
depicting  a  network  of  n  =  12  sensors  observing  m  =  20  potential  object  locations  according  to  the  shown 
connectivity.  If  Bi)f[  is  the  set  of  sensors  observing  the  target  at  time  k,  then  the  observation  model  of  the 
f-th  sensor  is  given  by 

P(su  =  l|6t. =  1)  =  {  pF<  ‘f \\  g£’  (4-62) 

That  is,  when  the  target  is  in  the  vicinity  of  an  active  sensor,  the  sensor  gets  a  positive  observation,  how¬ 
ever,  active  sensors  which  do  not  belong  to  the  set  Bi,h  could  also  falsely  declare  a  target  is  present  with 
probability  Pp.  This  discrete  model  is  simplistic,  yet  it  captures  essential  features  in  real  sensing  systems, 
namely,  overlapping  sensing  ranges,  limited  visibility  for  each  sensor,  as  well  as  geographical  neighborhood 
properties.  In  the  presence  of  clutter,  the  estimation  problem  becomes  more  involved.  We  have  to  adapt 
our  filter  to  account  for  the  uncertainty  in  the  origin  of  sensor  measurements.  We  let  Aik)  denote  the  set  of 
active  sensors  declaring  a  target  at  time  k,  and  Ai{k),  i  =  1, . . . ,  |*4(£;)|,  its  z-th  element.  Now  define  the 
events 


Oi{k)  =  {sk,Ai(k)  is  tai'get  originated},  i  =  1,  —  , \A{k)\ 

0\A(k)\+i{k)  —  {target  reaches  termination  state  r}  (4.63) 

and 

Oo(k)±f)0<l(k)  (4.64) 

i 

with  probabilities 

Pi(k)  =  mmh),  i  =  0,  •  •  • ,  |^)|  +  1  (4.65) 

where  Of{k)  denotes  the  complement  of  the  event  0,(k).  Wherefore,  0o(k)  refers  to  the  event  where  none  of 
the  measurements  at  time  k  is  target-originated, 
the  new  belief  at  time  k  can  be  written  as  Hence, 

\A(k)\ 

Pk  =  Po{k)  [Pk-ip]{j:Uk_l  j=0}  +  Pi(k)eMk)  +  P\A(k)\+i(k)eT  (4.66) 

i=  1 
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Figure  4.18:  Network  with  overlapping  sensor  regions.  12  sensors  (shown  as  circles)  cover  20  locations 
(squares).  An  edge  connects  a  circle  and  a  square  when  the  location  falls  within  the  sensing  region  of  that 
sensor. 


Next  we  compute  the  association  probabilities  Pi(k)  =  P(0i\Ik)-  Using  Bayes’  rule 

m  =  4^(fc)l_1  n  ^j-1)  n  x  = Mk)\bk-i)pk-i,  i 

jeA(k)  i<£A(k)  bk-i 


(4.67) 


and 

fuk) = p]FAm  n  n  sM^\Pk-iP}{j:uk.ltj=o}  (4-68) 

jeA(k)  ££A(k) 


An  analogous  approach  can  be  used  to  write  the  filtering  equations  for  the  overlap  model  but  the  evolution  is 
generally  more  difficult  to  write  mathematically  in  a  compact  form.  Procedurally,  it  follows  the  exact  same 
approach  described  above.  All  the  hypotheses  are  first  enumerated.  Then,  the  evolution  is  obtained  as  a 
weighted  combination  of  the  evolution  under  each  individual  hypothesis  where  the  weights  correspond  to  the 
association  probabilities.  Flowever,  the  number  of  hypotheses  in  this  case  is  significantly  larger  since  under 
one  hypothesis  it  might  very  well  be  the  case  that  multiple  measurements  from  awake  sensors  are  target 
originated  because  the  sensing  regions  of  different  sensors  overlap.  The  number  of  hypotheses  generally 
scales  exponentially  with  the  number  of  active  sensors.  To  reduce  the  space  of  association  hypotheses,  the 
controller  could  limit  the  maximum  number  of  sensors  to  be  activated  at  every  time  step  to  a  relatively  small 
number,  say  n\  sensors.  As  we  will  show  in  our  simulations  results,  we  choose  n\  =  5,  i.e.,  at  most  5  sensors 
could  be  active  at  a  given  time  instant,  we  present  our  proposed  point-based  scheduler  which  approximates 
the  optimal  solution  using  a  point-based  approximation  driven  by  the  non-linear  filters  described  previously. 


3.  Simulation  Results 

In  this  section,  we  show  experimental  results  illustrating  the  performance  of  the  proposed  scheduling  policies 
for  the  different  models. 

First,  to  illustrate  some  of  the  basic  ideas,  consider  a  simple  linear  network  with  11  sensors  where  the 
object  moves  according  to  a  symmetric  random  walk  either  one  step  to  the  left  or  one  step  to  the  right.  We 
term  this  network  Net  A.  Figure  4.19(a)  shows  the  tradeoff  curves  between  the  number  of  active  sensors  per 
unit  time  and  the  tracking  error  per  unit  time  using  our  point-based  scheduler  for  different  levels  of  clutter 
density  for  Net  A.  As  expected,  for  the  no  clutter  case  in  the  low  tracking  error  regime,  i.e.,  at  vanishing 
energy  cost  per  sensor,  activating  one  sensor  to  the  left  or  to  the  right  of  the  sensor  is  enough  to  perfectly 
track  the  target.  The  reason  being  that,  at  each  time  step  the  target  would  be  either  perfectly  observed  (by  an 
awake  sensor)  or  its  position  can  be  exactly  inferred.  Flence,  the  point-based  scheduler  in  this  case  converges 
to  the  optimal  scheduling  policy.  As  the  clutter  density  increases,  it  is  clear  that  the  tracking  error  increases 
for  the  same  number  of  active  sensors.  The  figure  illustrates  the  performance  of  our  point-based  scheduler 
for  different  clutter  densities,  namely,  when  10%  and  20%  of  the  time  an  active  sensor  measures  clutter. 
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(a)  (b) 

Figure  4.19:  (a)  Energy  tracking  tradeoff  for  different  clutter  densities  for  Net  A;  (b)  Number  of  active 
sensors  versus  clutter  density  for  different  energy  costs  per  sensor  for  Net  A. 


Figure  4.19(b)  shows  the  number  of  active  sensors  versus  the  clutter  density  for  fixed  energy  costs  per 
sensor  c  =  0.1  and  c  =  0.2.  Indeed,  for  the  no  clutter  case,  only  one  sensor  is  activated.  This  is  optimal  as 
long  as  c  <  0.5  since  the  energy  cost  is  smaller  than  the  expected  tracking  cost  of  0.5.  As  we  increase  the 
clutter  density,  the  scheduler  chooses  to  activate  more  sensors  to  compensate  for  the  uncertainty  associated 
with  the  origin  of  the  measurements  until  a  certain  point  where  the  clutter  density  is  just  too  high  that  the 
tracking  performance  approaches  that  of  uninformed  tracking  based  solely  on  the  available  knowledge  about 
the  propagation  model  and  the  clutter  model.  In  this  case,  activating  sensors  is  of  no  avail  and  the  scheduler 
judiciously  chooses  to  disengage  all  the  sensors  to  save  the  energy  resources  and  avoid  unnecessary  resource 
expenditure.  Not  surprisingly,  the  x-axis  intercept  i.e.,  the  cutoff  clutter  density  at  which  all  sensors  are 
deactivated,  is  larger  for  smaller  values  of  c  since  turning  on  more  sensors  is  less  costly. 

Second,  consider  the  scenario  where  the  object  is  moving  according  to  a  random  walk  anywhere  from 
three  steps  to  the  left  to  three  steps  to  the  right  in  each  time  step  as  in  Table  4.1.  Figure  4.20  illustrates  the 
energy-tracking  tradeoff  of  the  proposed  policies  for  different  levels  of  clutter  density.  It  is  clear  that  the 
degradation  in  performance  w.r.t.  to  the  no  clutter  case  is  graceful  at  low  and  moderate  clutter  densities. 

Next  we  consider  a  network  where  the  sensing  regions  of  different  sensors  overlap.  The  network  consists 
of  20  possible  object  locations  monitored  by  12  sensors  as  shown  in  Figure  4.18.  The  total  cost  per  unit  time 
versus  the  energy  cost  c  is  shown  in  Figure  4.21  (a)  for  different  levels  of  clutter  density.  All  the  curves 
saturate  when  the  energy  cost  is  too  high  and  the  scheduler  disengages  all  the  sensors.  In  this  case,  the 
total  cost  is  due  to  tracking  cost  based  solely  on  the  prior  information.  Not  surprisingly,  the  saturation  point 
occurs  at  smaller  c  for  higher  values  of  the  clutter  density.  For  moderate  values  of  the  clutter  density  (e.g. 
5%),  the  gap  between  the  saturation  points  of  the  cluttered  case  and  the  no  clutter  case  is  small  showing 
that  through  judicious  use  of  scheduling  actions  we  are  able  to  compensate  for  the  uncertainty  due  to  clutter. 
Figure  4.21  (b)  shows  the  tradeoff  curves  for  this  overlap  network. 
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Figure  4.20:  Energy  tracking  tradeoff  for  different  clutter  densities  for  Net  B. 


(a)  (b) 


Figure  4.21:  (a)  Total  cost  per  unit  time  versus  energy  cost  per  sensor  for  the  overlap  network;  (b)  Energy 
tracking  tradeoff  for  different  clutter  densities  for  network  with  overlapping  sensing  ranges 
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Chapter  5 

Agile  Sensors  and  Boundary  Tracking 


The  work  presented  in  this  chapter  has  been  done  by  the  group  of  Dr.  Bertozzi  (UCLA)  in  collaboration 
with  Dr.  Tartakovsky  (USC). 

1.  Agile  Sensors 

1.1.  Second  Generation  Testbed 

During  summer  2006  we  constructed  the  second  generation  of  an  economical  cooperative  control  testbed. 
The  original  car-based  vehicle  was  improved  with  on-board  range  sensing,  limited  on  hoard  computing,  and 
wireless  communication,  while  maintaining  economic  feasibility  and  scale.  A  second,  tank  based  platform, 
uses  a  flexible  caterpillar-belt  drive  and  the  same  modular  sensing  and  communication  components.  We 
demonstrated  practical  use  of  the  testbed  for  algorithm  validation  by  implementing  a  recently  proposed 
cooperative  steering  law  involving  obstacle  avoidance.  This  work  was  published  in  the  proceedings  of  the 
2007  American  Control  Conference. 

1.2.  Boundary  Tracking 

CoPI  Bertozzi  and  postdoc  Zhipu  Jin  developed  a  framework  for  environmental  boundary  tracking  and 
estimation  by  considering  the  boundary  as  a  hidden  Markov  model  (HMM)  with  separated  observations  col¬ 
lected  from  multiple  sensing  vehicles.  For  each  vehicle,  a  tracking  algorithm  is  developed  based  on  Pages 
cumulative  sum  algorithm  (CUSUM),  a  method  for  change  point  detection,  so  that  individual  vehicles  can 
autonomously  track  the  boundary  in  a  density  field  with  measurement  noise.  Based  on  the  data  collected 
from  sensing  vehicles  and  prior  knowledge  of  the  dynamic  model  of  boundary  evolvement,  we  estimate  the 
boundary  by  solving  an  optimization  problem,  in  which  prediction  and  current  observation  arc  considered  in 
the  cost  function.  Examples  and  simulation  results  were  presented  to  verify  the  efficiency  of  this  approach. 
This  work  was  published  in  the  2007  IEEE  Conference  on  Decision  and  Control.  The  algorithm  was  imple¬ 
mented  on  the  second  generation  testbed  using  a  convoy  of  vehicles.  Relative  positioning  between  vehicles 
allows  several  to  maintain  a  convoy  while  tracking  the  boundary.  The  algorithm  performs  well  in  the  pres¬ 
ence  of  moderate  sensor  noise.  Some  adaptation  of  the  algorithm  was  necessary  to  run  it  on  a  testbed  with 
limited  onboard  computing.  In  particular  a  modified  Kalman  filter  was  implemented  in  which  a  constant 
gain  was  estimated  a  priori  and  used  on  the  vehicle.  The  implementation  work  was  published  in  the  2009 
IEEE  American  Control  Conference. 

1.3.  Paper  Published  in  ICINCO  2010 

Bobby  Liu,  Martin  Short,  Yasser  Taima,  and  Andrea  Bertozzi  develop  a  searching  algorithm  for  a  group  of 
agents  moving  in  a  swarm  and  sensing  potential  targets.  The  objective  of  the  algorithm  is  to  use  these  groups 
to  efficiently  search  for  and  locate  targets  with  a  finite  sensing  radius  in  some  bounded  area.  We  present  an 
algorithm  that  both  controls  agent  movement  and  analyzes  sensor  signals  to  determine  where  targets  are  lo¬ 
cated.  We  use  computer  simulations  to  determine  the  effectiveness  of  this  collaborative  searching.  A  scaling 
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analysis  compares  swarm  size  to  proficiency  of  searching.  This  work  has  been  published  in  Proceedings  of 
the  7th  International  Conference  on  Informatics  in  Control,  Automation,  and  Robotics  (ICINCO),  Portugal, 
June  2010.  This  work  is  being  transitioned  to  China  Lake  Naval  Air  Warfare  Center  (see  Tech  Transfer 
section). 

1.4.  Paper  Published  in  ICINCO  2011 

One  time  step  up  funds  provided  on  this  contract  led  to  the  design  and  building  of  new  hardware  for  our 
microcar  testbed  at  UCLA.  The  cars  were  designed  and  built  by  Artero’s  Lab,  a  small  startup  company 
formed  by  former  masters  students  working  in  Bertozzi’s  lab.  The  initial  setup  was  tested  and  software 
controllers  were  written  by  EE  Masters  student  David  Hermina  as  paid  of  his  final  project  for  the  degree.  In 
summer  2010  a  team  of  undergraduates  developed  algorithms  to  test  the  capabilities  of  the  platform.  Their 
work  led  to  a  publication  in  ICINCO  2011  [61]. 

The  micro-cars  feature  two  on  board  processors;  each  devoted  to  different  tasks.  The  upper  board  is  a 
350MHz-rated  Virtex-4  FPGA.  It  is  dedicated  to  on-board  algorithm  processing  as  well  as  a  user  interface. 
The  main  motivation  for  a  powerful  on-board  processor  is  to  increase  the  autonomy  of  each  vehicle.  In 
previous  iterations  of  the  AML  Testbed,  vehicles  relied  on  a  third-party  desktop  computer  to  perform  all 
calculations.  With  this  new  processor,  the  cars  can  perform  all  required  processing  on-board.  The  lower 
board  is  a  50MHz-rated  ARM  Cortex-3  microcontroller.  It  is  dedicated  to  mo-  tion  control  and  sensor 
management.  The  lower  board  directly  controls  the  vehicles  motion  by  generating  power  modulation  signals 
that  feed  into  the  car’s  motor  and  steering  servo.  It  also  gathers  data  from  the  various  sensors  using  a 
prioritized  task  controller.  The  micro-cars  have  control  system  made  up  of  two  parts.  The  cars  have  a  rear 
wheel  drive  system  with  a  maximum  speed  of  20  cm/s.  The  cars  are  steered  by  a  axel-articulated  servo 
connected  to  the  front  wheels.  The  experimentally  verified  maximum  turning  angle  is  ±18  degrees.  This 
proved  to  be  one  of  the  largest  limitations  of  the  testbed  as  the  micro-cars  can  trace  a  circle  with  a  minimum 
diameter  of  50cm.  Because  the  increased  computing  capabilities  could  allow  for  more  complex  programs  to 
be  run  on  board,  the  memory  was  also  upgraded  from  the  previous  generation.  The  upper  board  has  access 
to  a  both  a  64MB  DDR  SDRAM  module  as  well  as  a  4  MB  ash  drive.  The  lower  board’s  microcontroller 
houses  internal  8KB  SRAM  and  a  64KB  ash  drive.  The  lower  board  also  contains  a  1KB  EEPROM  module 
to  store  system  control  parameters  that  arc  specific  to  each  car,  such  as  the  car’s  identification  number,  servo 
gains,  and  the  servo  offset. 

The  sensor  systems  of  the  micro-cars  is  vastly  increased  from  the  previous  generations.  Each  car  sports 
a  640x480  digital  camera,  two  high  performance  uni-directional  gyroscopes,  an  optical  encoder  used  for 
velocity  estimation,  and  an  infrared  sensor  module.  The  digital  camera  is  not  currently  functional  but  could 
later  be  integrated  to  add  advanced  image  processing  capabilities  to  algorithms.  The  camera  could  be  applied 
to  detect  obstacles  when  coupled  with  the  IR  sensors  or  to  locate  destinations  points  on  the  testbed.  The 
140Hz  analog  gyroscopes  provide  physical  orientation  sensing  information.  They  are  functional  but  arc  not 
currently  used  in  any  experiments  due  to  the  completely  at  nature  of  the  testbed.  Each  car  is  equipped  with 
either  a  long-range  or  a  short-range  forward- facing  infrared  sensing  module.  The  long  range  sensors  can 
detect  objects  in  the  10cm-  140cm  range  and  the  short  range  sensors  can  only  detect  from  10cm-80cm.  The 
IR  sensors  have  been  previously  characterized  in  master’s  student  David  Hermina’s  thesis  [102].  The  IR 
sensors  have  been  used  in  several  algorithm  tests  to  facilitate  an  emergency  stop  protocol  and  basic  barrier 
avoidance. 

The  cars  have  been  tested  with  respect  to  a  steering  control  algorithm  originally  proposed  by  Justh 
and  Krishnaprasad  and  modified  by  Morgan  and  Schwartz  for  obstacle  avoidance  [111]  this  algorithm  was 
featured  on  the  second  generation  testbed  [85].  In  our  new  work  we  test  peer  to  peer  networking  in  which 
communication  is  no  longer  all-all  but  rather  uses  low  bandwidth  information  sharing  through  subnetworks. 
One  example  shown  in  5.1  uses  a  daisy-chain  network  to  perform  the  steering  control  algorithm  originally 
written  for  all-all  coupling. 
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Figure  5.1:  Three  frames  of  an  experimental  run  using  daisy  chain  coupling.  The  cars  are  originally  sepa¬ 
rated  in  two  groups  (top  left).  During  the  run,  the  cars  regroup  (top  right)  and  a  common  orientation  before 
exiting  the  testbed  (bottom).  Figure  from  [71]. 


2.  Image  Segmentation  Through  Efficient  Boundary  Sampling 

Segmentation  is  one  of  the  most  important  problems  in  image  processing.  Partitioning  an  image  into  a 
small  number  of  homogeneous  regions  highlights  important  features,  allowing  a  user  to  analyze  the  image 
more  easily.  Applications  include  medical  imaging,  computer  vision,  and  geospatial  target  detection.  Image 
segmentation  methods  can  be  subdivided  into  region-based  vs.  edge-based  methods.  Region-based  methods 
include  the  Mumford-Shah  and  related  Chan-Vese  methods  which  both  involve  energy  minimization  with 
a  least  squares  fit  of  the  data  and  a  partition,  between  regions,  whose  length  is  minimized.  Edge-based 
methods  include  the  well-known  image  snakes  and  Canny  edge  detector.  Other  approaches  to  segmentation 
have  also  been  effective.  Statistical  methods  such  as  region  competition  rely  on  the  fact  that  images  have 
repetitive  features  that  can  be  learned  and  exploited  to  obtain  a  segmentation.  A  more  recent  fast  statistical 
method  called  Distance  Cut  is  semi-supervised  (the  user  identifies  segments  in  each  region)  and  is  based  on 
weighted  distances  and  kernel  density  estimation. 

All  of  these  methods  involve,  at  some  level,  sampling  all  the  pixels  in  an  image.  For  applications 
involving  high  dimensional  or  large  data  sets,  it  makes  sense  to  subsample  the  image.  This  is  especially 
important  for  high  resolution  data  where  it  can  be  prohibitive  to  perform  calculations  on  every  pixel  in  the 
image.  Bertozzi,  Tartakovsky,  Chen,  and  Wittman  develop  a  segmentation  method  is  designed  for  this  kind 
of  application  and  is  based  on  prior  work  for  cooperative  environmental  sampling  with  robotic  vehicles.  The 
algorithm  has  two  levels,  namely  a  global  searching  method,  which  locates  a  boundary  point,  and  a  local 
sampling  algorithm,  which  tracks  the  boundary  using  the  global  method  as  an  initial  point.  Occasionally,  if 
the  tracker  strays  too  far  from  the  boundary,  additional  uses  of  the  global  algorithm  are  needed.  The  local 
algorithm  is  based  on  CUSUM  statistics  (see  Figure  5.2).  This  work  has  been  published  in  AMRX  [30]. 
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Figure  5.2:  A  100  x  100  image  was  corrupted  with  additive  Gaussian  noise,  N(0,0.5).  Left:  Boundary 
tracking  without  a  change-point  detection  modification.  Middle:  Boundary  tracking  with  the  CUSUM  algo¬ 
rithm.  Right:  Threshold  dynamics  -  global  segmentation  method. 


Ill 


Chapter  6 

Detection  and  Tracking  of  Covert  Hostile 
Activities 


The  research  presented  in  this  chapter  has  been  performed  by  the  group  of  Dr.  Aram  Galstyan  at  ISI  in 
collaboration  with  Dr.  Paul  Cohen.  The  major  goal  of  this  part  of  the  project  is  to  develop  a  scalable  proba¬ 
bilistic  framework  for  performing  detecting  and  tracking  covert  activities  of  hostile  agents.  This  framework 
will  include  an  algorithmic  toolkit  for  detecting  and  tracking  hostile  activities,  methodology  for  analyzing 
properties  of  those  algorithms,  and  theoretical  models  that  will  address  the  general  question,  such  as  ac¬ 
curacy  and  trackability  and  detectability.  During  this  project,  we  have  made  significant  progress  in  both 
theoretical  and  computational  aspects  of  the  above  problems.  Below  we  summarize  these  achievements. 

1.  Modeling  Activities  via  Hidden  Markov  Models 

We  have  suggested  Event-Coupled  Factorial  HMMs  (EC-FHMM)  as  a  generic  framework  for  modeling  ac¬ 
tivities  in  Hats  and  other  domains.  EC-FHMM  arc  different  from  more  conventional  HMMs  with  factorial 
state  representation  in  the  way  observations  are  generated.  Specifically,  what  is  observed  in  EC-FHHMs  is 
the  interactions  between  different  chains,  as  shown  in  Figure  1.  In  the  context  of  the  plan  recognition  prob¬ 
lem,  each  chain  describes  an  agent,  while  observations  imply  that  two  chains  have  interacted  at  a  give  time 
step.  Thus,  in  contrast  with  previously  studied  factorial  HMMs  and  dynamic  Bayesian  networks  (DBN), 
where  the  topology  of  coupling  between  the  chains  is  predetermined  in  advance,  in  our  model  the  topology 
is  dynamic,  which  is  well  suited  for  capturing  dynamically  evolving  networks.  Furthermore,  the  interaction 
between  the  chains  is  informed  by  the  internal  states  of  the  nodes,  while  the  states  of  those  nodes  themselves 
arc  influenced  by  those  interactions.  This  provides  a  feedback  mechanism  between  individual  and  collective 
dynamics,  which  translates  into  a  very  reach  behavior  of  the  model. 

It  is  easy  to  see  that  after  a  sufficiently  long  time,  approximate  inference  is  infeasible  with  even  moderate 
N.  Thus,  one  needs  develop  approximate  methods  for  inference  and  learning.  To  this  end,  we  have  employed 
the  so  called  Boyen-Koller  factorization,  which  works  by  approximating  the  true  belief  states  by  a  product 
of  belief  states  over  smaller  clusters  of  variables.  This  approximation  has  been  shown  to  produce  good 
results  for  a  number  of  problems.  Importantly,  it  has  been  established  that  the  accumulative  error  caused  by 
this  approximation  is  bounded  by  the  mixing  rates  of  the  underlaying  Markov  dynamics. 

Clearly,  the  accuracy  of  the  approximation  depends  on  the  choice  of  the  cluster  variables.  The  perfect 
accuracy  is  recovered  when  all  the  coupled  variables  belong  to  a  single  cluster.  And  the  most  aggressive 
approximation  corresponds  to  the  situation  where  each  agent  is  represented  as  a  separate  cluster.  We  have 
previously  shown  that  for  simple  Hats  scenarios  even  the  most  aggressive  approximation  (i.e.,  each  agent 
is  treated  as  a  separate  “cluster”)  produces  reasonable  results  provided  that  the  prior  knowledge  about  the 
agents  is  sufficiently  accurate. 

In  this  phase,  we  have  extended  our  work  by  considering  iterative  approach  for  the  above  collective 
inference  problem.  The  main  premise  behind  the  iterative  scheme  is  the  following:  Assume  we  have  used 
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Figure  6.1:  Time-rolled  diagram  of  an  Event-Coupled  Factorial  HMM. 


the  most  aggressive  factorization  (one  agent  per  cluster),  and  found  the  hidden  state  sequences  for  individual 
agents.  This  can  be  done  by  the  slightly  modified  Viterbi  algorithm.  Thus,  if  we  have  n  agents,  then  after 
one  iteration  we  have  estimated  the  hidden  state  sequences  for  those  agents.  Now,  some  of  those  estimates 
will  be  more  accurate,  or  more  certain,  than  the  others.  Here  the  former  means  that  the  given  hidden 
state  sequence  has  a  higher  likelihood  under  the  assumed  model  parameters,  while  the  latter  means  that  the 
entropy  of  the  corresponding  posterior  distribution  is  low.  Then,  we  can  freeze  the  agent  whose  hidden  state 
sequence  is  the  most  accurate/certain,  and  repeat  the  iteration  for  the  remaining  n  —  1  agents  (in  practice, 
we  can  freeze  not  one  but  the  top  k  agents).  Indeed,  our  preliminary  results  suggests  that  this  procedure 
increases  the  accuracy  over  original,  one-shot  approach  to  the  inference. 

2.  Co-Evolving  Stochastic  Blockmodel  for  Dynamic  Networks 

In  EC-FHMM-s,  each  node  is  in  a  certain  state,  and  interaction  between  different  agents  depend  on  those 
states.  Many  situations,  however,  are  better  described  by  multi-faceted  interactions,  where  nodes  can  bear 
multiple  latent  roles  that  influence  their  relationships  to  others.  MMSB  accounts  for  such  “mixed”  interac¬ 
tions,  by  allowing  each  node  to  have  a  probability  distribution  over  roles,  and  by  making  the  interactions 
role-dependent  [4], 

We  have  developed  a  model  named  Co-evolving  Mixed  Membership  Stochastic  Blockmodel,  or  CMMSB, 
which  provides  a  dynamic  generalization  of  the  mixed  membership  model  by  explicitly  modeling  the  vari¬ 
ation  in  the  node  membership  vectors.  Previously,  a  dynamic  extension  of  the  MMSB  (dMMSB)  was  sug¬ 
gested  in  [57].  In  contrast  to  dMMSB,  where  the  dynamics  was  imposed  externally,  our  model  assumes  that 
the  membership  evolution  is  driven  by  the  interactions  between  the  nodes  through  a  parametrized  influence 
mechanism.  At  the  same  time,  the  patterns  of  those  interactions  themselves  change  due  to  the  evolution  of 
the  node  memberships. 

Another  advantage  of  our  model  over  dMMSB  is  that  the  latter  models  the  aggregate  dynamics,  e.g., 
the  mean  of  the  logistic  normal  distribution  from  which  the  membership  vectors  are  sampled.  CMMSB, 
however,  models  each  node’s  trajectory  separately,  thus  providing  better  flexibility  for  describing  system 
dynamics.  Of  course,  more  flexibility  comes  at  a  higher  computational  cost,  as  CMMSB  tracks  the  trajec¬ 
tories  of  all  nodes  individually.  This  additional  cost,  however,  can  be  well  justified  in  scenarios  when  the 
system  as  a  whole  is  almost  static  (e.g.,  no  shift  in  the  mean  membership  vector),  but  different  subsystems 
experience  dynamic  changes.  One  such  scenario  that  deals  with  political  polarization  in  the  U.S.  Senate  is 
presented  in  our  experimental  results  section. 

Consider  a  set  of  N  nodes,  each  of  which  can  have  K  different  roles,  and  let  jp  be  the  mixed  membership 
vector  of  node  p  at  time  t.  Let  Yt  be  the  network  formed  by  those  nodes  at  time  t:  Yt(p,  q)  =  1  if  the  nodes 
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p  and  q  arc  connected  at  time  t,  and  Yt(p,  q)  =  0  otherwise.  Further,  let  Yq-t  =  {Fo,  Tj, . . . ,  Yj }  be  a  time 
sequence  of  such  networks.  The  generative  process  that  induces  this  sequence  is  described  below. 

•  For  each  node  p  at  time  t  =  0,  employ  a  logistic  normal  distribution  over  a  simplex  sample.1 

V p,k  =  exP -  C(i%)),  $  ~  A/"(a°,  A) 

where  C(fl)  =  logQUfc  exp  (/</,.))  is  a  normalization  constant,  and  a0,  A  are  prior  mean,  and  covari¬ 
ance  matrix. 

•  For  each  node  p  at  time  t  >  0,  the  mean  of  each  normal  distribution  is  updated  due  to  influence  from 
the  neighbors  at  its  previous  step: 

=  (!  -  Pp)^1  +  PpPsfat- 1) 

where  fls(p,t- 1)  is  average  of  weighted  membership  vector  /7-s  of  the  nodes  which  node  p  has  met  at 
time  t  —  1 

Ps(P,t- 1)  =  'EqY(p,q)wtI^}qj2q 

(3p  describes  how  easily  the  node  p  is  influenced  by  its  neighbors.  The  membership  vector  at  time  t  is 

=  exP (^P,k  ~  C(tfp))i  Pp  ~ 

where  the  covariance  Yp  accounts  for  noise  in  the  evolution  process. 

•  For  each  pair  of  nodes  p,  q  at  time  t,  sample  role  indicator  vectors  from  multinomial  distributions: 

~  Mult(z\flp),  z!p^q  ~  Mult(z\TTq) 

Here  zp->q  is  a  unit  indicator  vector  of  dimension  K,  so  that  zp^qj-  =  1  means  node  p  undertakes 
role  k  while  interacting  with  q. 

•  Sample  a  link  between  p  and  q  as  a  Bernoulli  trial: 

Yt(p,  q)  ~  Bernoulli(y\(l  -  p)B^Bt^_q) 

where  B  is  a  K  x  K  role-compatibility  matrix,  so  that  B*s  describes  the  likelihood  of  interaction 
between  two  nodes  in  roles  r  and  s  at  time  t.  When  Bt  is  diagonal,  the  only  possible  interactions  arc 
among  the  nodes  in  the  same  role.  Also,  p  is  a  parameter  that  accounts  for  the  sparsity  of  the  network. 
Thus,  the  coupling  between  dynamics  of  different  nodes  is  introduced  by  allowing  the  role  vector  of  a  node 
to  be  influenced  by  the  role  vectors  of  its  neighbors.To  benefit  from  computational  simplicity,  we  updated  i F 
by  changing  its  associated  p.  This  update  of  /I  is  a  linear  combination  of  ft  at  its  current  state,  and  the  values 
of  its  neighbors.  The  influence  is  measured  by  a  node-specific  parameter  (3p,  and  wp<_q.  f3p  describes  how 
easily  the  node  p  is  influenced  by  its  neighbors:  flp  =  0  means  it  is  not  influenced  at  all,  whereas  (3P  =  1 
means  the  behavior  is  solely  determined  by  the  neighbors.  Conversely,  wp<_q  reflects  the  influence  that  node 
q  exerts  on  node  p,  so  that  larger  values  correspond  to  more  influence. 


2.1.  Inference  and  Learning 


Under  the  Co-Evolving  MMSB,  the  joint  probability  of  the  data  Yq:t  and  the  latent  variables  { pq .  N ,  2*  : 
p,q  G  N,  z!pj_g  :  p.  q  €  N}  can  be  written  in  the  following  factored  form.  To  simplify  the  notation,  we 
define  Yp  q  as  a  pair  of  z^_q,  and  z^q 


p(Yo:T ,  fl°iTN,  z°_ ;T,  Z°fl_T\d,  A,  B,  P(Yt(p,  q) \%q,  \j?p,  fl\) 


t  p,q 


pp)Hpu%\%A) 


p 

In  Equation  6.1,  the  term  describing  the  dynamics  of  the  membership  vector  is  defined  as  follows2: 

-ft  I  7ft— 1  7ft— 1  T77  V  a  \  _  f  _  f  7ft  t  t  7ft—  1  7ft—  1 


P(bp \&p  ,  BS(p,i)^pi  Pp)  =  faiPp  ~  Mflp  >  %(p,t))’ 

A(4-1’  4"(it))  =  (!  -  $_1)4_1 + Pl-^sU 


(6.1) 


(6.2) 

(6.3) 

(6.4) 


1  We  found  that  the  logistic  normal  form  of  the  membership  vector  suggested  in  [57]  led  to  more  tractable  equations  compared 
to  the  Dirichlet  distribution. 

2For  simplicity,  we  will  assume  EM  is  a  diagonal  matrix. 
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Algorithm  1  Variational  EM 

Input:  data  Yt(p,  q),  size  N,  T,  K 
Initialize  all  {7}*,  {cr}* 

Start  with  an  initial  guess  for  the  model  parameters. 

repeat 

repeat 

for  t  =  0  to  T  do 
repeat 

Initialize  0*  >  ,  (pp<—q  to  for  all  g ,  h 

repeat 

Update  all  {4>Y 
until  convergence  of  {<j)Y 
Find  {cr}*,  {7}* 

Update  all  {C}* 
until  convergence  in  time  t 
end  for 

until  convergence  across  all  time  steps 
Update  hyper  parameters, 
until  convergence  in  hyper  parameters 


Performing  exact  inference  and  learning  under  this  model  is  not  feasible.  Thus,  one  needs  to  resort  to 
approximate  techniques.  Here  we  use  a  variational  EM  [19,  189]  approach.  The  main  idea  behind  variational 
methods  is  to  posit  a  simpler  distribution  q(X)  over  the  latent  variables  with  free  (variational)  parameters, 
and  then  fit  those  parameters  so  that  the  distribution  is  close  to  the  true  posterior  in  KL  divergence. 

DKL{q\\p)=  f  q(X)log^%dX  (6.5) 

■lx  p{X,  Y ) 

Here  we  introduce  the  following  factorized  variational  distribution: 


n,®,®) =n«i(4i^’sp) x  n(®< 

p,t  p,q,t 


’p<-q 


!&-«)) 


(6.6) 


where  q\  is  the  normal  distribution,  and  qq  is  the  multinomial  distribution,  and  7^,  E* ,  Yp^q-  Yp^q  arc  the 
variational  parameters.  Intuitively,  Yp->q.g  is  the  probability  of  node  p  undertaking  the  role  g  in  an  interaction 
with  node  q  at  time  t,  and  Yp^q  /,  is  defined  similarly.  Note  that  in  the  E-step,  we  need  to  compute  the 
expected  value  of  log[)Ufe  exp (//£•)]  under  the  variational  distribution,  which  is  problematic.  Toward  this 
end,  we  introduce  N  additional  variational  parameters  and  replace  the  expectation  of  the  log  by  its  upper 
bound  induced  from  the  first-order  Taylor  expansion: 

logE  exp(/xfc)]  <  log  C  -  1  +  ^  ^2  exP (Pfc)  (6-7) 

The  variational  EM  algorithm  works  by  iterating  between  the  E-step  of  calculating  the  expectation 
value  using  the  variational  distribution,  and  the  M-step  of  updating  the  model  (hyper)parameters  so  that 
the  data  likelihood  is  locally  maximized.  The  pseudo-code  is  shown  in  Algorithm  1 ,  and  the  details  of  the 
calculations  are  discussed  below. 

2.2.  Variational  E-step 

In  the  variational  E-step,  we  minimize  the  KL  distance  over  the  variational  parameters.  Taking  the  derivative 
of  KL  divergence  with  respect  to  each  variational  parameter  and  setting  it  to  zero,  we  obtain  a  set  of  equations 
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that  can  be  solved  via  iterative  or  other  numerical  techniques.  For  instance,  the  variational  parameters 
(< frp^-q ,  4>tp4^q)i  corresponding  to  a  pair  of  nodes  (p.  q)  at  time  t,  can  be  found  via  the  following  iterative 
scheme: 


$_^ocexp(7*)ff)  x  Y[(B(g,h)Y*M(  1  -  B{g,h))1-Y*^f™h  (6-8) 

h 

^g,hOC  exp(7^)  x  H(B(g,h)Y*<r«)(  1  -  Sfo,  (6.9) 

9 


In  the  above  equations,  and  /,  are  normalized  after  each  update.  Note  also  that  Eqs.  6.8  and  6.9 

are  coupled  with  each  other  as  well  as  with  the  parameters  pfPjg,  7*  h- 

For  the  variational  parameters  we  have  for  the  diagonal  components  (<r*  1,  ap  2,  fe): 


cr; 


7 


cr 


p,k 


9 


c* 


=  1  +  (1  -  /3p)2  +  52  FHp,  q)Pgwl^p2  +  2ril(N  -  1)-^  exp(7*ifc  +  V  ^  ) 


(6.10) 


where  rp.  is  the  diagonal  component  of  the  covariance  matrix  S/(.  Similarly,  we  obtain  equations  for  the 
variational  parameters  7-s.  Generally,  those  equations  are  different  for  7°  ,  7 pg,  and  7*  ,  0  <  t  <  T. 
Since  those  equations  arc  too  cumbersome,  here  we  simply  note  that  their  general  form  is: 

7p  =  (6.11) 

Thus,  the  parameter  7*  depends  on  its  past  and  future  values,  Ppd 1  and  7p+1  ,  as  well  as  the  parameters  of 
its  neighbors.  Finally,  for  the  variational  parameters  Q  we  have 


Cp  =  52  exp(7p,t  + 

i 


a; 


p,i 


2 


) 


(6.12) 


Note  that  the  above  equations  can  be  solved  via  simple  iterative  update  as  before.  To  expedite  convergence, 
however,  we  combine  the  iterations  with  Newton-Raphson  method,  where  we  solve  for  individual  parame¬ 
ters  while  keeping  the  others  fixed,  and  then  repeat  this  process  until  all  the  parameters  have  converged. 


2.3.  Variational  M  step 


The  M-step  in  the  EM  algorithm  computes  the  parameters  by  maximizing  the  expected  log-likelihood  found 
in  the  E-step.  The  model  parameters  in  our  case  arc:  Bt,  the  role-compatibility  matrix,  the  covariance  matrix 
Ep,  for  each  node,  wp<_q  for  each  pair,  a,  and  A  from  the  prior. 

If  we  assume  that  the  time  variation  of  the  block  compatibility  matrix  is  small  compared  to  the  evolution 
of  the  node  attributes,  we  can  neglect  the  time  dependence  in  B,  and  use  its  average  across  time,  which 
yields: 


B(g,h) 


E 


/ j)t  sf\t 

p,q,t  ^p^q,g^p-^q,h 


(6.13) 


Likewise,  for  the  update  of  diagonal  components  of  the  noise  covariance  matrix 


m  =  jv(r-  “  Pvs(p,t-i),k)2]  (6-14) 


Similar  equations  arc  obtained  for  dp  and  w^_q.  The  update  equation  of  dp  and  wp<_  q  is  a  function  of  7  and 
<7  which  arc  related  to  the  transition  for  specific  node  p.  Since  these  equations  arc  rather  involved,  they  will 
be  provided  elsewhere. 
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The  priors  of  the  model  can  be  expressed  in  closed  form  as  below: 


p 


(6.15) 


dk  — 


(6.16) 


2.4.  Results  for  US  Senate  Co-Sponsorship  Network 

We  have  also  performed  some  preliminary  experiments  for  testing  our  model  against  real-world  data.  In 
particular,  we  used  senate  co-sponsorship  networks  from  the  97th  to  the  104th  senate,  by  considering  each 
senate  as  a  separate  time  point  in  the  dynamics.  There  were  43  senators  who  remained  paid  of  the  senate 
during  this  period.  For  any  pair  of  senators  (p.  q )  in  a  given  senate,  we  generated  a  directed  link  p  — >•  q  ifp 
co-sponsored  at  least  3  bills  that  q  originally  sponsored.  The  threshold  of  3  bills  was  chosen  to  avoid  having 
too  dense  of  a  network.  With  this  data,  we  wanted  to  test  (a)  to  what  extent  senators  tend  to  follow  others 
who  share  their  political  views  (i.e.,  conservative  vs.  liberal)  and  (b)  whether  some  senators  change  their 
political  creed  more  easily  than  others. 

The  number  of  roles  K  =  2  was  chosen  to  reflect  the  mostly  bi-polar  nature  of  the  US  Senate.  The 
susceptibility  of  senator  p  to  influence  is  measured  by  the  corresponding  parameter  Bp,  which  is  learned 
using  the  EM  algorithm.  High  f5  means  that  a  senator  tends  to  change  his/her  role  more  easily.  Likewise,  the 
power  of  influence  of  senator  q  on  senator  p  is  measured  by  the  parameter  wp^q,  where  wpi_qi  >  wp<_q2 
means  senator  q\  is  more  influential  on  senator  p  than  senator  qo.  Here  the  direction  of  the  arrow  reflects 
the  direction  of  the  influence  which  is  opposite  to  the  direction  of  link.  To  initialize  the  EM  procedure,  we 
assigned  the  same  0,  and  w  to  all  the  senators,  and  start  with  a  matrix  which  is  weighted  at  the  diagonal  for 
B. 

Another  method  for  validation  is  to  compare  the  degree  of  influence.  Our  model  handles,  and  learns, 
the  degree  of  influence  in  the  update  equation.  Sorting  out  influential  senators  is  an  area  of  active  research. 
Recently,  KNOWLEGIS  has  been  ranking  US  senators  based  on  various  criteria,  including  influence,  since 
2005.  Since  our  data  was  extracted  from  the  97th  senate  to  the  104th  senate,  direct  comparison  of  the 
rankings  was  impossible.  Another  study  [101]  ranked  the  10  most  influential  senators  in  both  parties  who 
have  been  elected  since  1955.  We  compared  our  top  5  influential  senators,  and  we  were  able  to  find  3 
senators  (Sen.  Byrd,  Sen.  Thurmond,  and  Sen.  Dole)  in  the  list. 

2.4.1.  Interpreting  Results 

The  role-compatibility  matrix  learned  from  the  Variational  EM  has  high  values  on  the  diagonal  confirming 
our  intuition  that  interaction  is  indeed  more  likely  between  senators  that  share  the  same  role.  Furthermore, 
the  learned  values  of  / 3  showed  that  senators  varied  in  their  “susceptibility”.  In  particular,  Sen.  Arlen  Spector 
was  found  to  be  the  most  influenceable  one,  while  Sen.  Dole  was  found  to  be  one  of  the  most  inert  ones. 
Note  that  while  there  are  no  direct  ways  of  estimating  the  “dynamism”  of  senators,  our  results  seem  to  agree 
with  our  intuition  about  both  senators  (e.g..  Sen.  Spector  switched  part ics  in  2009  while  Dole  became  his 
party's  candidate  for  President  in  1996). 

To  get  some  independent  verification,  we  compared  our  results  to  the  yearly  ratings  that  ACU  (American 
Conservative  Union),  and  ADA( Americans  for  Democratic  Action)  assign  to  senators  3.  ACU/ADA  rated 
every  senator  based  on  selected  votes  which  they  believed  to  have  a  clear  ideological  distinction,  so  that  high 
scores  in  ACU  mean  that  they  arc  truly  conservative,  while  lower  score  in  ACU  suggests  they  are  liberal, 
and  for  ADA  vice  versa.  To  compare  the  rating  with  our  predictions  (given  by  the  membership  vector)  we 
scaled  the  former  to  get  scores  in  the  range  [0,1], 

3  Accessible  at  http://www.conservative.org/, 
http://www.adaction.org/ 
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Figure  6.2  shows  the  relationship  between  these  scores  and  our  mixed  membership  vector  score,  con¬ 
firming  our  interpretation  of  the  two  roles  in  our  model  as  corresponding  to  liberal/conservative.  Although 
those  values  cannot  be  used  for  quantitative  agreement,  we  found  that  at  least  qualitatively,  the  inferred  tra¬ 
jectories  agree  reasonably  well  with  the  ACU/ADA  ratings.  This  agreement  is  rather  remarkable  since  the 
ACU/ADA  scores  are  based  on  selected  votes  rather  than  co-sponsorship  network  as  in  our  data. 


Correlation  between  Inference  and  ACU  score 


Correlation  between  Inference  and  ADA  score 


Figure  6.2:  Correlation  between  ACU/ADA  scores  and  inferred  probabilities. 

Of  course,  we  are  most  interested  in  correctly  identifying  the  dynamics  for  each  senator.  We  compare 
our  inferred  trajectory  of  the  most  dynamic  senator,  and  the  inert  senator  to  the  scores  of  ACU,  and  ADA. 
In  Figure  6.3  the  scores  of  ADA  have  been  flipped,  so  that  we  can  compare  all  of  the  scores  in  the  same 
measurement.  Flowever,  since  ACU/ADA  scores  are  rated  for  every  senator  each  year,  the  dynamics  of 
inference,  and  the  dynamics  of  ACU/ADA  scores  cannot  be  compared  one  to  one.  Not  all  senators  showed 
high  correlation  of  the  trend  like  senator  Specter,  and  Dole. 


Congress  number  Congress  number 


Figure  6.3:  Comparison  of  inference  results  with  ACU  and  ADA  scores:  Sen.  Specter  (top)  and  Sen.  Dole 
(bottom). 

2.4.2.  Polarization  Dynamics 

The  yearly  ACU/ADA  scores  give  a  good  comparison  of  the  relative  political  position  of  senators  scored 
in  each  year.  However,  they  are  not  very  appropriate  for  comparison  between  years,  a  point  illustrated  by 
the  fact  that  the  score  is  based  on  voting  records  for  different  bills  in  each  year.  Therefore,  for  validation  of 
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Congress  number 


Figure  6.4:  Polarization  trends  during  97th-104th  US  Congresses. 


the  dynamics  we  turn  to  another  scoring  system  highly  regarded  by  political  scientists  and  used  to  observe 
historical  trends,  the  DW-NOMINATE  score.  For  the  time  period  of  our  study,  [103]  shows  that  the  political 
polarization  of  the  senate  was  increasing.  In  particular,  they  show  that  the  gap  between  the  average  DW- 
NOMINATE  score  of  Republicans  and  Democrats  is  monotonically  increasing,  as  we  show  in  Figure  6.4.  In 
fact,  the  polarization  for  the  entire  senate  was  stronger  every  year.  This  is  due  to  the  unbalanced  seats  in  the 
entire  senate.  In  other  words,  our  data  had  22  Republican,  and  21  Democratic,  while  for  the  entire  senate, 
majority  out  numbered  minority  by  around  10  seats.  For  comparison,  for  each  time  step  we  took  the  average 
of  our  inferred  score  for  the  14  most  and  least  conservative  senators.  As  we  show  in  Figure  6.4,  our  inferred 
result  agrees  qualitatively  with  the  results  of  [103],  showing  an  increase  in  polarization  for  every  senate  in 
the  studied  time-window.  Since  the  DW-NOMINATE  scores  uses  its  own  metric,  and  our  polarization  is 
measured  by  the  difference  between  upper  average  and  lower  average  probability,  we  should  not  expect  to 
get  quantitative  agreement.  We  would  like  to  highlight,  however,  that  the  direction  of  the  trend  is  correctly 
predicted  for  each  of  the  eight  terms. 

3.  Theoretical  Analysis  of  Hidden  Markov  Models 

Hidden  Markov  Models  (HMM)  provide  one  of  the  simplest  examples  of  structured  data  observed  through  a 
noisy  channel.  The  inference  problems  of  HMM  naturally  divide  into  two  classes  [55,  132]:  i)  recovering  the 
hidden  sequence  of  states  given  the  observed  sequence,  and  ii)  estimating  the  model  parameters  (transition 
probabilities  of  the  hidden  Markov  chain  and/or  conditional  probabilities  of  observations)  from  the  observed 
sequence.  The  first  class  of  problems  is  usually  solved  via  the  maximum  a  posteriori  (MAP)  method  and  its 
computational  implementation  known  as  Viterbi  algorithm  [55,  132].  For  the  parameter  estimation  problem, 
the  prevailing  method  is  maximum  likelihood  (ME)  estimation,  which  finds  the  parameters  by  maximizing 
the  likelihood  of  the  observed  data.An  alternative  approach  to  parameter  learning  is  Viterbi  Training  (VT), 
also  known  in  the  literature  as  segmental  K-means,  Baum- Viterbi  algorithm,  classification  EM,  hard  EM, 
etc.  Instead  of  maximizing  the  likelihood  of  the  observed  data,  VT  seeks  to  maximize  the  probability  of  the 
most  likely  hidden  state  sequence. 

During  this  project  we  have  developed  methods  based  on  statistical  physics  of  disordered  systems  that 
allowed  us  to  analyze  asymptotic  properties  of  inference  methods  in  HMMs.  Below  we  outline  the  main 
elements  of  our  approach  and  summarize  our  main  findings. 

3.1.  Theoretical  Analysis  of  Trackability 

Despite  its  extensive  use  of  Viterbi  algorithm  in  many  applications,  its  properties,  and  specifically,  the 
structure  of  its  solution  space,  have  received  surprisingly  little  attention.  On  the  other  hand,  it  is  clear  that 
choosing  a  single  state  sequence  might  be  insufficient  for  adequately  understanding  the  structure  of  the 
in  ter  red  process.  To  get  a  more  complete  picture,  one  needs  to  know  whether  there  are  other  nearly  optimal 
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sequences,  how  many  of  them,  how  they  compare  with  the  optimal  solution,  and  so  on.  In  our  work,  we  have 
shown  that  this  can  be  related  to  the  notion  of  trackability,  which  can  be  intuitively  defined  as  one’s  ability  to 
(accurately)  track  certain  stochastic  processes  [37,  144],  This  problem  was  addressed  by  Crespi  et.  al.  [37] 
for  so  called  weak  models,  where  the  entries  in  the  HMP  transition  and  emission  matrices  arc  either  0  or  1. 
They  reported  a  sharp  transition  between  the  trackable  and  non-trackable  regimes.  Here  trackable  means 
that  the  number  of  hypotheses  that  can  explain  given  observation  sequence  grows  at  most  polynomially 
with  the  length  of  the  sequence,  whereas  non-trackable  means  that  the  number  of  such  hypotheses  grow 
exponentially. 

For  more  general  stochastic  processes,  an  information-theoretical  characterization  of  trackability  was 
suggested  in  [144],  Within  this  approach,  the  accuracy  is  characterized  by  the  probability  Pr[x  /  x]  of  the 
estimated  sequence  x  not  being  equal  to  the  actual  one,  while  the  structure  of  the  solution  space  is  described 
via  the  number  of  elements  |H|  in  the  (conditional)  typical  set  H  of  x  sequences  given  an  observed  sequence 
y  (complexity). 

Note  that  whereas  the  accuracy  and  the  complexity  measures  of  [144]  deteriorate  even  for  a  small  (but 
generic)  noise  intensity,  so  that  a  process  is  trackable  only  in  the  complete  absence  of  noise.  During  this 
phase  of  work,  we  have  suggested  an  alternative  measure  for  trackability,  that  is  more  intuitive  in  the  sense 
that  it  allows  a  finite  amount  of  noise.  Namely,  we  suggested  to  augment  the  notion  of  trackability  with  the 
particular  inference  method  being  used.  Generally,  the  structure  of  an  inference  method  can  be  characterized 
by  the  accuracy  of  the  estimation,  and  the  number  AA(y)  of  solutions  x(y)  that  the  method  can  produce  in 
response  to  a  given  sequence  y.  For  instance,  an  HMM  process  can  be  said  to  be  trackable  under  Viterbi  (or 
more  generally.  Maximum  a  Posteriori,  or  MAP)  inference  if  it  yields  at  most  polynomial  (in  the  observation 
length)  number  of  solution  with  reasonable  accuracy. 

In  our  MURI  work  we  have  studied  the  structure  of  MAP  inference  for  the  simplest  binary,  symmetric 
HMM,  by  reducing  it  to  the  Ising  model  in  random  fields.  In  this  way,  the  average  cost  —  ^yPr(y)Pr(x(y)|y) 
of  MAP  and  the  logarithm  of  the  number  of  solutions  )Uy  Prfy)  In M(y)  relate,  respectively,  to  the  energy 
and  the  entropy  of  the  Ising  model  at  the  zero  temperature.  Consider  a  binary,  discrete-time  Markov  stochas¬ 
tic  process  X  =  (X\,  X2,  ■  ■  ■  ,Xjy).  Each  random  variable  X/,  has  only  two  realizations  j;/,  =  ±1.  The 
Markov  feature  implies 


ttn 

P(x)  =  Hk=2P(xk\xk-i)p(xi),  (6.17) 

where  p(x X\xk~i)  is  a  time-independent  transition  probability  of  the  Markov  process.  For  the  considered 
binary  symmetric  situation  it  is  parameterized  by  a  single  number  0  <q<  Pp(l|l)  =p(~  1|  -  1)  =  1-9, 
p(l|  —  1)  =  />( — 1 1 1)  =  q,  and  the  stationary  distribution  ispst(l)  =  pst  ( —  1 )  =  Furthermore,  the  noise 
process  is  assumed  to  be  memory-less,  time-independent  and  unbiased: 

TT^ 

P(y|x)  =  llA;=17r(2/fc|a7fc),  Vk  =  ±  1  (6.18) 

where  7r(— 1 1 1)  =  7r(l|  —  1)  =  e,  vr(l|l)  =  vr(— 1|  —  1)  =  1  —  e,  and  e  is  the  probability  of  error.  Here 
memory-less  refers  to  the  factorization  in  (6.18),  time-independence  refers  to  the  fact  that  in  (6.18)  7r(...|...) 
does  not  depend  on  k,  while  unbiased  means  that  the  noise  acts  symmetrically  on  both  realizations  of  the 
Markov  process:  7r(l|  —  1)  =  7r(— 1|1). 

It  can  be  shown  that  by  appropriate  parameterization,  the  MAP  estimation  is  identical  to  minimizing  the 
following  Ising  Hamiltonian: 

EN  N 

XkXk+1-h )  VkXk,  (6.19) 

k=l  z '  k=l 

where  2 J  =  ln[(l  —  q)/q]  and  2 h  =  ln[(l  —  e)/e].  The  (positive)  factor  J  in  (6.19)  is  the  spin-spin 
interaction  constant,  and  h  is  determined  by  the  observations. 
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The  expected  number  of  ground-state  configurations  (e.g.,  sequences  x \ ,  x<> , ... xn )  that  minimizes  the 
above  Hamiltonian  can  be  related  to  the  zero-temperature  thermodynamic  entropy  0, 

0  =  V  p(y)lnAA(y).  (6.20) 

y 

Thus,  a  finite  ®  means  that  there  arc  exponentially  many  outcomes  of  minimizing  H  (y ,  x)  over  x.  Further¬ 
more,  let  us  introduce  free  energy  (T  is  the  temperature,  and  (5  =  1/T) 

F(J,h,T)  =  -Tj ]  P(y)lnV  e-^(y.*;.W  (6.21) 

Then  the  entropy  is  given  as  0  =  —8tF\t^o-  We  can  also  define  the  overlap  between  observed  and  inferred 
sequences,  which  is  given  by  v  =  jjdhF. 

The  quantities  defined  above  can  be  calculated  exactly.  Our  main  results  can  be  stated  as  follows: 
For  small  noise  intensities,  defined  by  e  <  q/2,  the  inferred  state  sequence  coincides  with  the  observation 
sequence,  and  hence  there  is  no  difference  between  MAP  and  Maximum  Likelihood  (ML)  estimations,  as  the 
solution  is  observation-dominated.  While  it  was  expected  that  the  two  methods  agree  for  a  vanishing  noise, 
the  fact  of  their  exact  agreement  for  a  finite  range  of  the  noise  is  non-trivial.  In  this  regime,  the  entropy 
is  identically  zero,  which  defines  a  trackable  process.  Furthermore,  upon  increasing  the  noise  intensity 
the  MAP  solution  switches  between  different  operational  regimes  that  arc  separated  by  first-order  phase 
transitions.  In  particular,  a  first-order  phase-transition  separates  trackable  and  non-trackable  regimes.  At  this 
transition  point  the  influence  of  the  prior  information  becomes  comparable  to  the  influence  of  observations. 
This  is  shown  in  Figure  6.5(a)  and  6.5(b) 


Figure  6.5:  MAP  characteristics  versus  the  noise  intensity  in  the  regimes  m  =  1,  2,3  for  q  =  0.24:  (a) 
Overlap  (b)  Entropy  In  (a)  the  open  squares  represent  simulation  results,  obtained  by  running  the 
Viterbi  algorithm  and  calculating  the  respective  quantities  directly.  We  used  sequences  of  size  104,  and 
averaged  the  results  over  100  random  trials. 

There  arc  several  directions  for  further  developments.  For  instance,  it  will  be  interesting  to  generalize  the 
analysis  presented  here  beyond  the  binary  hidden  Markov  processes  considered  here.  In  this  case,  the  MAP 
optimization  problem  can  be  mapped  to  a  Potts  model.  We  would  like  to  note  that  the  behavior  observed  in 
the  simple  binary  model  can  be  explained  by  the  emergence  of  a  finite  fraction  of  “frustrated”  spins,  where 
the  frustration  can  be  attributed  to  two  competing  tendencies  -  accommodating  observations  on  one  hand, 
and  the  hidden  (Markovian)  dynamical  model  on  the  other.  Since  this  mechanism  is  rather  general,  we 
believe  that  most  features  of  the  MAP  scheme  uncovered  here  via  an  exact  analysis  of  the  simplest  binary 
model  will  survive  in  more  general  situations. 

3.2.  Comparative  Analysis  of  Viterbi  Training  and  Maximum  Likelihood  Estimation  for  HMMs 

As  we  have  mentioned  above,  there  arc  two  main  method  for  parameter  estimation  for  HMMs.The  maxi¬ 
mum  likelihood  (ML)  estimation  finds  the  parameters  by  maximizing  the  likelihood  of  the  observed  data. 
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whereas  Viterbi  Training  (VT)  seeks  to  maximize  the  probability  of  the  most  likely  hidden  state  sequence. 
Maximizing  VT  objective  function  is  hard,  so  in  practice  it  is  implemented  via  an  EM-style  iterations  be¬ 
tween  calculating  the  MAP  sequence  and  adjusting  the  model  parameters  based  on  the  sequence  statistics. 
It  is  known  that  VT  lacks  some  of  the  desired  features  of  ML  estimation  such  as  consistency,  and  in  fact, 
can  produce  biased  estimates  [55].  However,  it  has  been  shown  to  perform  well  in  practice,  which  explains 
its  widespread  use  in  applications  such  as  speech  recognition  [17],  unsupervised  dependency  parsing  [158], 
and  so  on.  It  is  generally  assumed  that  VT  is  more  robust  and  faster  but  usually  less  accurate,  although  for 
certain  tasks  it  outperforms  conventional  EM  [158]. 

The  current  understanding  of  when  and  under  what  circumstances  one  method  should  be  preferred  over 
the  other  is  not  well-established.  For  HMMs  with  continuos  observations,  Ref.  [105]  established  an  upper 
bound  on  the  difference  between  the  ML  and  VT  objective  functions,  and  showed  that  both  approaches  pro¬ 
duce  asymptotically  similar  estimates  when  the  dimensionality  of  the  observation  space  is  very  large.  Note, 
however,  that  this  asymptotic  limit  is  not  very  interesting  as  it  makes  the  structure  imposed  by  the  Markovian 
process  irrelevant.  A  similar  attempt  to  compare  both  approaches  on  discrete  models  (for  stochastic  context 
free  grammars)  was  presented  in  [142].  However,  the  established  bound  was  very  loose. 

In  this  project,  one  of  our  goals  was  to  understand,  both  qualitatively  and  quantitatively,  the  difference 
between  the  two  estimation  methods.  We  develop  an  analytical  approach  based  on  generating  functions 
for  examining  the  asymptotic  properties  of  both  approaches.  Previously,  a  similar  approach  was  used  for 
calculating  entropy  rate  of  a  hidden  Markov  process  [5].  We  have  provided  a  non-trivial  extension  of  the 
methods  that  allows  to  perform  comparative  asymptotic  analysis  of  ML  and  VT  estimation.  It  was  shown  that 
both  estimation  methods  correspond  to  certain  free-energy  minimization  problem  at  different  temperatures. 
Furthermore,  we  demonstrated  the  approach  on  a  particular  class  of  HMM  with  one  unambiguous  symbol 
and  obtain  a  closed-form  solution  to  the  estimation  problem.  This  class  of  HMMs  is  sufficiently  rich  so  as 
to  include  models  where  not  all  parameters  can  be  determined  from  the  observations,  i.e.,  the  model  is  not 
identifiable  [55]. 

Our  man  results  are  as  follows:  In  contrast  to  the  ML  approach  that  produces  continuously  degenerate 
solutions,  VT  results  in  finitely  degenerate  solution  that  is  sparse,  i.e.,  some  [non-identifiable]  parameters 
arc  set  to  zero,  and,  furthermore,  converges  faster.  Note  that  sparsity  might  be  a  desired  feature  in  many 
practical  applications.  For  instance,  imposing  sparsity  on  conventional  EM-type  learning  has  been  shown  to 
produce  better  results  part  of  speech  tagging  applications  [180].  Whereas  [180]  had  to  impose  sparsity  via 
an  additional  penalty  term  in  the  objective  function,  in  our  case  sparsity  is  a  natural  outcome  of  maximizing 
the  likelihood  of  the  best  sequence.  While  our  results  were  obtained  on  a  class  of  exactly- solvable  model,  it 
is  plausible  that  they  hold  more  generally. 

The  fact  that  VT  provides  simpler  and  more  definite  solutions — among  all  choices  of  the  parameters 
compatible  with  the  observed  data — can  be  viewed  as  a  type  of  the  Occam's  razor  for  the  parameter  learning. 
Note  finally  that  statistical  mechanics  intuition  behind  these  results  is  that  the  a  posteriori  likelihood  is 
(negative)  zero-temperature  free  energy  of  a  certain  physical  system.  Minimizing  this  free  energy  makes 
physical  sense:  this  is  the  premise  of  the  second  law  of  thermodynamics  that  ensures  relaxation  towards 
a  more  equilibrium  state.  In  that  zero-temperature  equilibrium  state  certain  types  of  motion  are  frozen, 
which  means  nullifying  the  corresponding  transition  probabilities.  In  that  way  the  second  law  relates  to  the 
Occam's  razor. 

4.  Semi-Supervised  Clustering  in  Graphs 

In  recent  years  there  has  been  a  great  deal  of  interest  in  modeling  and  understanding  relational  network- 
structured  data.  While  traditional  learning  methods  assume  data  instances  arc  independent  and  identically 
distributed,  relational  learning  takes  into  account  non-independencies  —  in  the  form  of  links  and  relations 
between  different  entities  —  and  so  extends  learning  to  richly  structured  data.  To  represent  relational  data, 
a  number  of  different  models  have  been  proposed.  One  of  the  well-studied  models  for  networked  data  that 
has  its  roots  in  social  network  analysis  is  the  so  called  stochastic  block-model  [69],  where  the  nodes  in  the 
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network  arc  assigned  to  a  number  of  groups  (blocks).  The  main  assumption  is  that  the  nodes  in  the  same 
group  have  similar-  attributes,  and  are  structurally  equivalent  in  the  sense  that  they  have  the  same  pattern  of 
links  in  the  networks.  More  recent  work  has  suggested  “softer”  mixed  membership  models  where  nodes  can 
be  associated  with  several  groups  simultaneously  [4]. 

Often,  class  membership  information  is  not  available,  and  one  needs  to  use  statistical  inference  to  recover 
the  latent  structures.  An  important  question  then  is  to  what  extent  this  can  be  accomplished.  Consider,  for 
instance,  a  simple  block-model  with  two  equal-sized  groups  ,  which  can  be  characterized  by  two  numbers 
p  and  r  -  probability  of  a  link  within  a  group  and  between  the  groups,  respectively.  Thus,  each  node  is 
connected,  in  average,  with  pN  and  r N  nodes  within  the  same  cluster  and  across  the  clusters,  respectively, 
where  N  is  the  number  of  nodes  in  each  group.  An  important  question  is  how  well  one  can  recover  the 
latent  group  structure  through  statistical  inference  in  the  limit  of  large  N.  The  answer  to  this  question 
is  straightforward  for  dense  networks,  where  the  network  connectivity  scales  linearly  with  N.  Indeed,  it 
has  been  shown  that  the  clusters  in  this  planted  partition  model  can  be  recovered  with  high  accuracy  in 
polynomial  time  if  p  —  r  ^  n-1//2+e  [34].  Thus,  we  can  say  that  the  detection  threshold  converges  to  p  =  r 
in  the  limit  N  — >■  oo. 

The  situation  can  be  significantly  different  for  sparse  graphs,  where  the  average  connectivity  remains 
finite  as  N  — >  oo,  p  =  a/N,  r  =  7 /N,  where  a  and  7  are  average  connectivity  with  and  between  the 
clusters.  Indeed,  recently  it  was  shown  [134]  that  for  sparse  block-structured  networks  with  a  given  within- 
cluster  connectivity  a  there  is  a  critical  between-class  connectivity  yc  <  0  so  that  for  any  7  >  jc  clusters 
cannot  be  recovered  by  better  than  random  accuracy  in  the  asymptotic  limit.  More  specifically,  it  was 
demonstrated  that  the  model  is  characterized  by  a  phase  transition  from  detectable  to  undetectable  regimes 
as  one  increases  the  overlap  between  the  clusters.  Clearly,  this  type  of  behavior  is  undesirable  as  it  signals 
inference  instabilities  -  large  fluctuations  in  accuracy  in  response  to  small  shifts  in  the  parameters. 

In  this  project,  we  have  shown  that  this  instability  can  be  suppressed  if  one  knows  the  correct  group 
labels  for  a  finite  fraction  of  nodes.  This  can  be  viewed  as  a  semi-supervised  version  of  the  problem,  as 
opposed  to  an  unsupervised  version  where  the  only  available  information  is  the  observed  graph  structure. 
Generally,  graph-based  clustering  methods  can  utilize  two  types  of  background  knowledge  -  correct  cluster 
group  assignment  for  a  subset  of  nodes,  or  pair-wise  constraints  in  the  form  of  must-link  (cannot-links), 
which  imply  that  pair  of  nodes  must  be  (cannot  be)  assigned  to  the  same  group.  Below  we  describe  our 
studies  that  examine  the  impact  of  such  pair-wise  constraints  on  inference. 

4.1.  Model 

Let  A  be  the  observed  adjacency  matrix  of  interaction  graph  of  N  nodes  so  that  Ay  =  1  if  we  have  observed 
a  link  between  nodes  i  and  j,  and  Ay  =  0  otherwise.  Within  the  stochastic  block-model,  the  nodes  in  the 
network  are  assigned  to  a  number  of  groups  (blocks),  and  and  the  probability  of  a  link  between  two  nodes 
depends  on  their  group  membership.  In  the  simple  symmetric  bi-cluster  scenario  considered  here,  the  model 
is  characterized  by  two  numbers  -  probability  of  a  link  within  and  across  the  groups,  defined  as  p  and  r, 
respectively.  The  conditional  distribution  of  observation  for  a  given  configuration  x  reads 

p(A|x)  =  pc+  [1  -  p)c-rd+  [1  -  r]d~  (6.22) 

Here  c+,  d+  (c_,  cL)  are  the  total  number  of  observed  (missing)  links  within  and  across  the  groups, 

£+  =  ^  ^ .  .-A-ij5Xilxj  ?  C—  =  ^  ^ .  .(1  Aij^Sxi^xj  (6.23) 

1->J  l->3 

d+  =  y\  Ay(l  -  SXitXj)  ,  d _  =  y\  .(1  -  Aj)(  1  -  8Xi,Xj),  (6-24) 

l->3  l->3 

where  5l3  =  1  if  i  =  j  and  h'y  =  0  otherwise.  Let  us  define  J^l  =  ln[(l  —p) /(I  —  r)],  Jl  =  In \p/r\  +  Jnl- 
Then  the  log  of  the  joint  distribution  over  both  observed  and  hidden  variables  can  be  written  as  follows: 

iT(x,  A)  =  —  ln[p(A|x)p(x)]  =  -V\  JLAjSXi,Xj  +  JNLdXi,Xj  +  Hn(x)  (6.25) 
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Here  Jl  and  Jnl  stand  for  the  contributions  from  observed  links  and  non-links,  respectively,  while  the  last 
term  H7r(x.)  encodes  prior  information  about  the  latent  structure  one  might  have.  In  the  scenario  considered 
below,  we  assume  that  the  cluster  sizes  arc  known  a  priori.  This  constraint  can  be  forced  by  the  appropriate 
choice  of  Hw(x.),  so  that  any  clustering  arrangement  that  violates  the  size  constraint  will  be  disallowed. 
Then  it  is  easy  to  check  that  the  second  term  amounts  to  a  constant  that  can  be  ignored.  Furthermore,  since 
below  we  arc  interested  in  the  minimum  of  H  (x.  A),  we  can  set  Jl  =  1  without  loss  of  generality. 

For  a  given  parameter  values,  and  a  given  observed  graph,  minimizing  the  above  expression  is  equivalent 
to  maximum  a  posteriori  estimation  of  the  latent  structure.  The  following  remark  is  due:  When  p  and  r  are 
chosen  irrespective  of  N  (so  that  average  number  of  links  per  nodes  scales  linearly  with  A),  then  we  expect 
the  group  structure  to  be  recovered  with  high  accuracy  if  N  is  sufficiently  large  [34].  However,  many  real- 
world  networks  arc  sparse.  To  account  for  this,  we  introduce  average  number  of  neighbors  within  and  across 
the  groups,  a  and  7,  and  let  p  =  a/N,  r  =  7 /N,  so  that  the  average  connectivity  remains  finite  as  N  ->  00. 
Below  we  study  the  accuracy  of  the  inference  depending  on  a  and  7. 

To  proceed  further,  we  make  the  bi-component  nature  of  the  network  explicit  by  introducing  separate 
variables  Xi  =  ±1  and  27  =  ±1  (i  =  1, . . . ,  N )  for  two  groups.  Then  Eq.  6.25  is  reduced  to  the  following 
Ising  Hamiltonian  (aside  from  an  unessential  scaling  factor): 

N  N  N 

H  =  -  22  JijXiXj  -  22  JijXiXj  -  22  KijXiXj  +  H„(x)  .  (6.26) 

i<j  i<j  i,j 

Here  J7  and  Jij  arc  the  elements  on  two  diagonal  blocks  of  the  matrix  A  describing  the  connectivity  within 
each  cluster,  whereas  iT^-s  arc  the  elements  on  the  (upper)  off-diagonal  block  of  A  that  link  nodes  across 
the  clusters.  In  the  unsupervised  block-model,  they  arc  random  Bernoulli  trials  with  parameters  p  and  r. 

To  account  for  background  information  in  the  form  of  pairwise  constraints,  we  use  the  following  form 
for  the  prior  part  of  the  Hamiltonian: 


N  N 

#71- (x)  =  -Wml  y ^[QjjXjXj  ~  +9ijXiXj\  +  Wd  22  JijXiXj  .  (6.27) 

i<j  i,j 

where  0,j  =  1  ( 0t]  =  1)  if  the  corresponding  pair  of  nodes  arc  connected  via  a  must-link  constraint  within 
the  first  (second)  cluster,  and  Jrj  =  0  ( #7-  =  0)  otherwise.  Similarly,  0tJ  =  1  if  there  is  a  cannot  link 
between  corresponding  nodes  in  respective  clusters,  and  (f>l3  =  0  if  there  is  no  such  link.  Here  wmi  and  wci 
arc  the  costs  of  violating  a  must-link  and  cannot-link  constraints,  respectively.  For  the  sake  of  simplicity, 
below  we  will  choose  wmi  =  wci  =  M  >  1,  where  M  is  an  integer. 

Below  we  will  assume  that  the  constraints  arc  introduced  randomly  and  independently  for  each  pair  of 
nodes.  Namely,  6(7, 6(y-s  and  (j)l3  arc  Bernoulli  trials  with  parameter  /+  and  /_,  respectively.  Then  the 
prior  part  of  the  Hamiltonian  can  be  absorbed  into  6.26  by  the  following  choice  for  the  distribution  of  the 
couplings  in  6.26: 


p(Jij)  =  [1  -  p  -  f+]S(Jij)  +  p5(Jij  -  1)  +  f+S(Jij  -  Wmi)  (6.28) 

p(Kij)  =  [l^r-f-]6{Kij)  +  r8(Kij~l)  +  f_6(Kij  +  wcl)  (6.29) 


We  arc  interested  in  the  properties  of  the  above  Hamiltonian  6.26  in  the  limit  of  large  N.  Below  we  study 
it  within  the  Bethe-Peierls  approximation.  Let  P{h)  ( P(h ))  denote  the  probability  of  an  internal  ( cavity ) 
field  acting  on  an  x  (x)  spin.  Then  we  have  according  to  the  zero  temperature  cavity  method  [106]: 


p(h)  =  /It  I  d  J)n pi'h)n  )  j  1 1  _^d#onp(iTon)  j 

x  5 


rN 


rN 


rN  - 


h~ 


yy  ^[h-kiJoki  -  2Zk=i^k,K °fc] 


(6.30) 
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where  <p[a,  b]  =  sign(a)  min[  |a|,  b ],  and  where  (resp.  /</.)  are  the  fields  acting  on  the  x-spin  from  x-spin 
(respectively  from  other  x-spins). 

Once  P(h)  is  found  we  obtain  the  first  two  moments  of  as 

m  =  J  P(h)  sign(fi),  q  =  j  P(h)  sign2(fi), . . . ,  (6.31) 

Here  m  is  the  magnetization  averaged  over  the  graph  structure  (including  the  constraints)  (i.e.,  averaging 
over  Jij,  Jij  and  K^j ),  and  Gibbs  distribution,  which,  at  zero  temperature  case  considered  here  it  means 
averaging  over  all  configurations  of  x,  and  x,  that  in  the  thermodynamic  limit  have —  the  same  (minimal) 
values  of  the  Hamiltonian  H. 

In  (6.31)  m  is  the  magnetization,  while  q  is  called  EA  (Edwards- Anderson)  order  parameter;  q  differs 
from  1  due  to  possible  contribution  oc  5(h)  in  P(h).  Note  that  the  accuracy  of  the  clustering  (i.e.,  probability 
that  a  node  has  been  assigned  to  the  correct  cluster)  is  simply  1  'I'"' .  Thus,  \m\  =  1  corresponds  to  perfect 
clustering,  whereas  m  =  0  means  that  discovered  clusters  have  only  random  overlap  with  the  true  cluster 
assignments. 

Equation  6.30  cannot  be  solved  analytically  for  arbitrary  M.  Below  we  study  two  specific  case:  M  =  2, 
where  some  analytical  insights  can  be  obtained,  and  M  =  oo,  where  we  will  employ  population  dynamics 
to  study  properties  of  P(h). 

First,  we  consider  the  case  of  soft  constraints  M  =  2.  The  results  are  shown  in  Figure  6.6(a)  where 
we  plot  the  magnetization  m  as  a  function  of  a  for  7  =  1.  For  p  =  0,  which  corresponds  to  unsupervised 
scenario,  there  is  a  critical  value  of  a  below  which  the  magnetization  in  is  zero.  Recall  that  the  clustering 
accuracy  (i.e.,  fraction  of  correct  cluster  assignments)  is  given  as  1  1  J,'"  .  Thus,  for  any  a  <  ac  the  estimation 
cannot  do  any  better  than  random  guessing.  At  a  certain  value  of  p,  the  detection  threshold  becomes  0  =  7. 
If  p  is  increases  even  further,  the  model  has  a  non-zero  magnetization  even  when  a  <  7.  Note  that  this  shift 
suggests  highly  non-linear  effect  from  the  added  constraints  depending  on  the  network  parameters.  Indeed, 
when  the  connectivity  is  close  to  its  critical  value,  the  constraints  can  significantly  improve  the  clustering 
accuracy  by  moving  the  system  away  from  the  critical  regime.  And  when  the  system  is  away  from  the 
critical  region  to  start  with,  then  the  addition  of  the  constraints  might  yield  no  improvement  at  all  compared 
to  the  unsupervised  scenario. 


In  Figure  6.6(b)  we  plot  the  detection  boundaries  on  (a,  7)  plane  for  different  values  of  p.  For  each  value 
of  p,  the  corresponding  boundary  7 c(a;  p)  separates  two  regimes,  so  that  points  below  (above)  the  separator 
correspond  to  detectable  (undetectable)  clusters.  The  diagonal  line  a  =  7  is  drawn  for  comparison.  In 
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the  unsupervised  case,  the  detection  boundary  starts  at  (a,  7)  =  (0,1),  and  asymptotically  behaves  as 
a  —  7  oc  \J(x  +  7  for  large  a  +  7.  One  can  see  that  the  presence  of  supervision  keeps  the  shape  of  the 
boundary  intact,  and  simply  moves  it  upwards.  Thus,  for  any  fixed  a,  one  can  shift  the  threshold  to  7  =  a 
(and  beyond)  by  labeling  appropriate  number  of  edges.  Note  also  that  for  a  fixed  number  of  labeled  edges, 
the  impact  is  stronger  for  sparser  graphs,  and  diminishes  as  the  link  density  increases. 

We  now  consider  the  case  of  hard  constraints,  by  setting  M  =  00  4.  In  this  case,  the  cavity  equa¬ 
tion  involves  all  the  order  parameters,  which  makes  its  analysis  more  complicated.  Instead,  we  address 
this  case  by  solving  the  cavity  equation  using  population  dynamics  [106],  The  results  arc  depicted  in  Fig¬ 
ure  6.7(a), 6.7(b).  We  also  compare  our  results  to  simulations  using  synthetic  data.  After  generating  random 
graphs  of  size  N  =  25, 000,  we  find  the  ground  state  of  the  Hamiltonian  6.26  using  simulated  annealing. 

For  a  subset  of  nodes  connected  by  labeled  edges,  we  can  determine  the  relative  group  membership  for 
any  pair  in  the  group  due  to  the  transitivity  of  the  constraints.  Therefore,  finding  node  assignments  that 
satisfy  hard  link  constraints  on  a  graph  amounts  to  a  two-coloring  problem  and  can  be  done  efficiently.  As 
we  add  random  edges  to  a  graph,  the  size  of  the  connected  clusters  is  well-known.  For  p  <  1,  most  clusters 
are  disconnected  and  the  size  of  the  largest  cluster  is  0(log(A)).  At  p  =  1,  we  reach  the  “percolation 
threshold”  where  the  size  of  the  largest  cluster  goes  like  Of  Ar2/3> ) .  Once  p  >  1,  O(N)  nodes  belong  to  one 
giant  connected  component.  We  will  investigate  the  consequences  of  these  different  regimes  below. 

Looking  at  the  results  of  Figure  6.7(a),  we  see  that  without  supervision,  as  we  vary  the  within-cluster 
connectivity,  a,  there  is  a  sharp  detection  threshold  (clusters  are  detectable  when  m  >  0).  For  small 
amounts  of  supervision,  p  <  1,  the  impact  of  the  constraints  is  to  shift  the  detection  threshold  to  smaller 
values  of  a.  Qualitatively,  this  is  no  different  than  the  effect  of  adding  more  unlabeled  edges  within  clusters. 
This  behavior  is  expected,  since  adding  hard  constraints  is  equivalent  to  studying  the  same  unsupervised 
clustering  problem  on  a  renormalized  graph  (e.g.,  merging  two  nodes  that  are  connected  via  constraints). 
This  is  in  contrast  to  results  for  prior  information  on  nodes  in  [5],  which  showed  that  even  small  amounts  of 
node  supervision  shifted  the  detection  threshold  to  its  lowest  possible  value  0  =  7. 

As  p  — >  1,  there  is  a  qualitative  change  in  our  ability  to  detect  clusters.  A  large  number  of  nodes, 
0(A2/3),  are  connected  by  labeled  edges.  If  we  take  the  relative  labeling  of  nodes  in  this  largest  group 
as  the  “correct”  one,  than  we  have  a  situation  similar  to  node  supervision,  which,  as  discussed,  moves  the 
detection  threshold  to  a  =  7.  While  this  large  labeled  component  suffices  to  create  non-zero  magnetization 
in  finite  graphs  (as  seen  from  the  simulated  annealing  results),  as  N  gets  large,  the  effect  of  this  component 
diminishes.  For  p  >  1,  we  see  that  the  fraction  of  nodes  contained  in  the  largest  labeled  component  suffice 
to  produce  non-zero  magnetization  even  at  the  group-defining  threshold  0  =  7. 

In  Figure  6.7(b),  we  investigate  the  location  of  the  detection  threshold  in  the  (a,  7)  plane.  For  p  <  1  we 
see  that  for  all  values  of  7  the  threshold  is  simply  shifted  to  a  lower  value  of  a,  similar  to  the  M  =  2  case. 
As  p  — >•  1,  the  detection  threshold  approaches  the  line  that  defines  cluster  structure  a  =  7.  For  p  >  1,  a 
fraction  of  nodes  are  fixed  by  the  edge  constraints.  Therefore,  magnetization  is  nonzero  even  when  a  =  7. 

Note  that  according  to  our  results,  addition  of  constraints  does  not  provide  automatic  improvement  over 
the  unsupervised  case.  Indeed,  when  the  cost  of  violating  constraints  is  finite,  the  only  impact  of  the  added 
pair-wise  constraints  is  to  lift  the  detection  boundary.  Thus,  whether  adding  constraints  is  beneficial  or  not 
depends  on  the  network  parameters.  More  specifically,  consider  an  unsupervised  clustering  problem  with 
network  connectivities  a  and  7,  and  let  A (p)  =  7  —  7 c(a,  p),  where  7 c(a)  is  the  (unsupervised)  detection 
boundary  discussed  in  Section  1 .  Then  adding  p  constraints  per  node  will  be  beneficial  only  if  it  levitates 
the  detection  boundary  above  the  a  =  7  line.  Not  also  that  for  a  fixed  p,  the  impact  of  semi-supervision 
diminishes  for  large  a,  7  limit,  and  the  detection  threshold  re-emerges. 


4Note  that  fixing  M  to  some  large  but  finite  number  of  0(N )  will  guarantee  that  all  the  constraints  are  satisfied. 
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(a)  (b) 

Figure  6.7:  (a)  Magnetization  plotted  against  a  for  different  p.  Lines  are  generated  from  population  dy¬ 
namics  and  points  are  generated  from  simulated  annealing.  From  bottom  to  top  we  have  p  =  0, 0.5, 1,  2. 
(b)  Location  of  the  m  =  0  threshold  on  the  (cc,  7)  plane.  Dashed  line  corresponds  to  the  analytic  result 
for  p  =  0  and  the  solid  line  is  a  =  7.  Squares  (circles)  calculated  using  population  dynamics  at  p  =  0.5 
(P  =  1). 
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Chapter  7 

Fighting  Crime 


The  work  presented  in  this  chapter  has  been  done  by  the  group  of  Dr.  Bertozzi  and  Dr.  Brantingham. 

1.  Geographic  Profiling  from  Kinetic  Models  of  Criminal  Behavior 

UCLA  postdocs  George  O.  Mohler  and  Martin  B.  Short  have  considered  the  problem  of  estimating  the  prob¬ 
ability  density  of  the  “anchor  point”  (residence,  place  of  work,  etc.)  of  a  criminal  offender  given  a  set  of 
observed  spatial  locations  of  crimes  committed  by  the  offender.  Starting  from  kinetic  models  of  criminal 
behavior,  they  derive  the  probability  density  of  anchor  points  using  the  Fokker-Planck  equation  and  Bayes’ 
Theorem.  Here  geographic  inhomogeneities  such  as  housing  densities  and  geographic  barriers  (bodies  of 
water,  parks,  etc.)  arc  naturally  incorporated  into  the  probability  density  estimate.  The  resulting  equations 
arc  elliptic  PDEs  that  can  be  solved  efficiently  using  Multigrid  or  other  standard  computational  techniques. 
They  test  their  methodology  against  distance  to  crime  data  provided  by  the  Los  Angeles  Police  Depart¬ 
ment.  Their  results  highlight  the  benefits  of  incorporating  elements  of  criminal  behavior  and  geographic 
inhomogeneities  into  profiling  estimates. 

2.  PDE  Models  of  Crime 

2.1.  Modeling  of  Urban  Crime  Hotspots 

Terrorist  and  insurgent  activities  have  a  distinct  parallel  to  urban  crime  in  that  they  are  constrained  by  the 
same  need  to  encounter  victims  and  targets  in  the  absence  of  effective  security.  Therefore,  the  fundamental 
‘physics’  of  criminal  offenses  may  be  classified  according  to  the  mobility  of  offenders  and  potential  tar¬ 
gets/victims.  Some  crime  types  may  arise  under  a  full  range  of  physical  conditions  (e.g.,  homicide),  while 
others  arc  more  constrained  (e.g.,  burglary).  A  similar  classification  can  be  developed  for  terrorist  and  insur¬ 
gent  attacks.  The  models  we  have  developed  to  study  crime  should  transfer  readily  to  the  study  of  terrorist 
and  insurgent  activity  and  event  patterning.  Moreover,  we  have  allied  ourselves  with  LAPD  and  Long  Beach 
PD  in  order  to  develop  direct  comparisons  between  our  models  and  real  field  data  from  spatially  extended 
urban  environments.  As  in  the  case  of  criminal  offenders,  routine  mobility  patterns  are  the  proximate  cause 
of  how  terrorists  and  insurgents  encounter,  select  and  attack  targets.  We  believe  that  the  methods  we  have 
developed  to  study  crime  pattern  formation  may  serve  as  a  foundation  to  predicting  spatial  and  temporal  pat¬ 
terns  in  terrorist  attack  ranging  from  small-scale  events  (e.g.,  sniper,  hostage  taking)  to  larger-scale  actions 
including  potential  WMD  attacks. 

Criminologists  have  long  known  of  the  existence  of  crime  “hotspots”:  extended  geographic  regions 
which  display  a  higher  than  average  rate  of  crime,  at  least  temporarily.  While  empirical  measurements  of 
such  hotspots  have  been  numerous,  there  has  been  little  progress  in  understanding  the  precise  mechanisms 
underlying  the  formation  and  subsequent  dynamics  of  these  spots.  In  this  regal'd,  perhaps  the  most  well- 
developed  theories  have  been  the  exact-  and  near-repeat  crime  hypo  theses,  which  posit  that  a  geographic 
location  and  its  surrounding  areas  experience  greater  rates  of  criminal  events  for  a  period  of  time  following 
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an  initial  event.  These  hypotheses  have  been  tested  both  by  other  researchers  and  ourselves  using  a  variety 
of  crime  data  from  around  the  world,  and  have  been  found  to  hold  over  a  wide  range  of  cities  and  crime 
types. 

It  has  been  our  goal,  therefore,  to  create  mathematical  models  based  around  these  phenomena  that  may 
shed  light  on  how  and  why  crime  hotspots  form,  and  may  even  enhance  the  predictive  capability  of  law 
enforcement  agencies,  allowing  them  to  better  utilize  their  limited  resources.  Toward  this  end,  we  have 
thus  far  created  two  possible  models  of  crime  hotspot  generation:  an  agent-oriented  model,  and  a  target- 
oriented  model.  The  former  seeks  to  explain  the  empirical  observations  from  first  principles  by  simulating 
the  known  behavior  of  real  criminal  agents  within  a  landscape  of  targets,  while  the  latter  takes  a  more 
empirical  approach  and  uses  the  historical  crime  data  in  an  area  to  predict  future  events  there.  Both  of  these 
approaches  will  be  explained  below. 

The  agent-oriented  model  imagines  a  number  of  criminal  agents  that  move  around  a  virtual  environment, 
occasionally  committing  criminal  acts  as  they  encounter  targets.  Each  target  i  within  the  environment  has 
an  attractiveness  value  A,;  associated  with  it.  This  attractiveness  serves  to  bias  crime  in  two  way:  higher 
A,  values  directly  correspond  to  higher  probabilities  of  an  agent  committing  a  crime  when  located  at  target 
i,  and  agents  prefer  to  move  toward  targets  with  higher  A*  values  when  walking  on  the  grid.  In  order  to 
capture  the  exact-repeat  phenomenon  noted  above,  A*  is  temporarily  increased  after  a  crime  event  occurs  at 
target  i.  To  model  the  near-repeat  phenomenon.  A,  is  allowed  to  spatially  spread  to  other  targets  j  that  are 
near  i.  Finally,  criminal  agents  “return  home”  after  offending  (they  are  removed  from  the  grid),  and  each 
location  gives  rise  to  new  agents  at  a  rate  T. 

The  full  mathematical  model  imagined  in  this  discrete  form  possesses  a  number  of  parameters  (seven), 
and  simulations  based  on  this  model  can  exhibit  drastically  different  behavior  depending  upon  the  values 
these  parameters  take.  In  essence,  though,  only  three  distinct  regimes  of  behavior  arc  observed:  no  hotspot 
formation,  transitory  hotspot  formation,  or  stationary  hotspot  formation  (see  Figure  7.1).  To  better  under¬ 
stand  what  parameter  combinations  lead  to  the  various  behavioral  regimes,  we  have  recast  the  discrete  model 
in  continuum  form,  deriving  two  coupled  partial  differential  equations  to  describe  the  system: 


dA 

~dt 


r]'V2A  —  A  +  Aq  +  pA  and 


pA  + A 


An 


(7.1) 

(7.2) 


where  p  represents  the  density  of  criminal  agents,  and  //,  Ao,  and  A  are  the  only  three  remaining  parameters 
of  the  system  after  the  continuum  limit  is  taken.  Numerical  integrations  of  Eqns.  7.1  and  7.2  result  in 
hotspot  maps  that  are  quite  similar  to  those  created  via  the  fully  discrete  model  for  corresponding  parameter 
choices.  This  work  has  been  published  in  the  journal  M3AS:  Mathematical  Models  and  Methods  in  the 
Applied  Sciences  in  a  special  issue  on  traffic,  crowds,  and  swarms  [150]. 

In  addition  to  their  numerical  integration,  we  have  performed  a  variety  of  analytical  analyses  of  Eqns.  7. 1 
and  7.2.  A  linear  stability  analysis  has  provided  us  with  the  understanding  of  hotspot  formation  as  a  classic 
dynamical  instability,  and  given  insights  into  what  parameter  regimes  should  lead  to  hotspot  formation. 
Specifically,  we  now  know  that  if  the  parameters  of  the  system  are  such  that  the  inequality 

A°<^A-^A2-^A^A.  (7.3) 

holds,  the  system  will  exhibit  hotspots.  The  size  and  spacing  of  these  hotspots  can  also  be  determined  by  the 
three  continuum  parameters.  A  weakly  nonlinear  analysis  of  the  equations  has  also  been  performed,  and  the 
results  show  that  hotspots  may  arise  via  either  supercritical  or  subcritical  pitchfork  bifurcations,  depending 
upon  parameter  values.  The  possibility  of  subcritical  bifurcations  allows  for  hysteresis  effects  within  the 
system,  indicating  that  police  action  may  be  able  to  permanently  destroy  individual  hotspots,  even  if  the 
police  presence  is  eventually  removed  from  the  area.  Finally,  we  have  observed  coarsening  behavior  within 
this  system,  whereby  a  steady  state  comprised  of  n  hotspots  may  spontaneously  change  to  a  state  with  fewer 
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t=730  days,  1575  criminals  t=730  days,  91  criminals  t=730  days,  722  criminals 


0  50  100  0  50  100  0  50  100 

(a)  no  hotspots  (b)  transitory  hotspots  (c)  stationary  hotspots 


Figure  7.1:  Example  output  from  simulations  of  the  agent-oriented  model.  These  figures  illustrate  the  three 
regimes  of  behavior  observed  in  this  system. 


than  n  spots,  even  after  very  long  periods  of  seemingly  no  change.  This  could  account  for  the  sometimes 
rapid  disappearance  or  emergence  of  hotspots  observed  in  real  crime  data. 

PhD  student  Nancy  Rodriguez,  under  the  direction  of  co-PI  Bertozzi,  developed  a  local  existence  and 
uniqueness  of  solutions  to  the  continuum  version  of  this  model,  a  coupled  system  of  partial  differential 
equations,  as  well  a  continuation  argument.  She  compared  this  PDE  model  with  a  generalized  version  of  the 
Keller-Segel  model  for  chemotaxis  as  a  first  step  to  understanding  possible  conditions  for  global  existence 
vs.  blow-up  of  the  solutions  in  finite  time.  Global  well-posedness  of  the  model  is  still  an  open  problem 
however  this  work  develops  the  ground  work  and  distinguishes  the  nonlinearities  present  in  this  model  from 
prior  mathematical  work  on  related  problems  of  bacterial  chemotaxis.  The  paper  by  Rodriguez  and  Bertozzi 
has  been  published  in  M3  AS  [137]. 

2.2.  Control  of  Hotspots  by  Law  Enforcement 

We  extend  an  agent-based  model  of  crime -pattern  formation  initiated  in  [150]  by  incorporating  the  effects 
of  law  enforcement  agents.  We  investigate  the  effect  that  these  agents  have  on  the  spatial  distribution  and 
overall  level  of  criminal  activity  in  a  simulated  urban  setting.  Our  focus  is  on  a  two-dimensional  lattice 
model  of  residential  burglaries,  where  each  site  (target)  is  characterized  by  a  dynamic  attractiveness  to 
burglary  and  where  criminal  and  law  enforcement  agents  are  represented  by  random  walkers.  The  dynamics 
of  the  criminal  agents  and  the  target-attractiveness  field  are,  with  certain  modifications,  as  described  in  [150]. 
Here  the  dynamics  of  enforcement  agents  are  affected  by  the  attractiveness  field  via  a  biasing  of  the  walk,  the 
detailed  rules  of  which  define  a  deployment  strategy.  We  observe  that  law  enforcement  agents,  if  properly 
deployed,  will  in  fact  reduce  the  total  amount  of  crime,  but  their  relative  effectiveness  depends  on  the 
number  of  agents  deployed,  the  deployment  strategy  used,  and  spatial  distribution  of  criminal  activity.  For 
certain  policing  strategies,  continuum  PDE  models  can  be  derived  from  the  discrete  systems.  The  continuum 
models  are  qualitatively  similar  to  the  discrete  systems  at  large  system  sizes.  This  work  was  earned  out  by 
Paul  Jones  as  part  of  his  PhD  thesis  under  the  direction  of  Lincoln  Chayes  and  Jeff  Brantingham. 

2.3.  Bifurcation  Theory  for  Crime  Hotspots 

Short,  Bertozzi,  Brantingham  and  Tita  developed  non-linear  analyses  of  a  PDE  model  of  crime  hostpot 
formation  published  during  the  previous  MURI  period.  This  work  examines  the  non-linear  stability  of 
crime  hotspots  form  by  fundamental  behaviors  of  the  diffusion  of  risk  associated  with  crime  in  uniform, 
target  rich  crime  environments.  They  show  that  there  are  at  least  two  types  of  parameter  regimes  that 
produce  either  super-critical  crime  hotspots  or  sub-critical  crime  hotspots.  Super-critical  hotspots  emerge  in 
linearly  unstable  regimes  from  small  perturbations  in  crime  and  lead  to  apparent  crime  displacement  when 
existing  hotspots  are  suppressed  by  police  action.  By  contrast,  sub-critical  hotspots  form  in  linearly  stable 
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Figure  7.2:  (left)  Cover  of  March  2,  2010  issue  of  PNAS.  (right)  Crime  hotspot  suppression  figure  from  the 
cover  article  [151].  Suppression  results  for  the  PDE  system  with  parameters  chosen  to  generate  supercritical 
or  subcritical  crime  hotspots.  (A)  Suppression  of  supercritical  crime  hotspots.  Shown  is  the  configuration 
of  supercritical  hotspots  at  timestep  t  =  100,  just  prior  to  the  introduction  of  crime  suppression.  Crime 
suppression  is  then  introduced  over  the  area  of  each  visible  hotspot,  leading  to  the  eradication  of  the  original 
hotspots  but  corresponding  increases  in  risk  in  neighboring  regions,  seen  at  t  =  120.  The  transient  structure 
at  t  =  120  resembles  a  hot  ring  solution  surrounding  the  location  of  the  original  central  hotspot.  By  the  time 
of  the  next  suppression  at  t  =  200,  a  new  steady  state  featuring  hotspots  in  positions  adjacent  to  the  original 
ones  has  been  achieved.  (B)  Suppression  of  subcritical  crime  hotspots.  Shown  is  a  central  subcritical  hotspot 
at  f  =  100,  just  prior  to  the  introduction  of  crime  suppression.  Crime  suppression  is  then  introduced  over 
the  area  of  the  hotspot,  leading  to  the  eradication  of  the  hotspot  by  t  =  120.  No  transient  structures  appear 
in  this  case.  Eventually  suppression  is  lifted  at  t  =  200  and  the  system  quickly  adopts  the  homogenous 
steady  state.  Color  scale  shows  red  as  higher  crime  area  and  blue  as  lower  crime  area. 
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regimes  from  large  perturbations  in  crime.  When  sub-critical  hotspots  arc  suppressed  by  police  action  they 
do  not  reemerge  (i.e.,  they  dissipate)  even  when  police  pressure  is  removed.  This  is  a  hysteresis  effect  that  is 
shown  in  the  non-linear  analysis.  The  technical  analysis  is  published  in  SIAM  Journal  on  Applied  Dynamical 
Systems  (SIADS).  The  general  features  of  the  model  characteristics  and  its  implications  arc  discussed  in  the 
March  2010  cover  article  of  the  Proceedings  of  the  National  Academy  of  Sciences  (PNAS).  See  Figure  7.2. 
It  also  received  quite  a  lot  of  attention  from  the  Scientific  and  International  Press  -  see  front  section  for 
references. 

2.4.  Extension  to  Game  Theory  Models 

The  evolution  of  human  cooperation  has  been  the  subject  of  much  research,  especially  within  the  framework 
of  evolutionary  public  goods  games,  where  several  mechanisms  have  been  proposed  to  account  for  persistent 
cooperation.  Yet,  in  addressing  this  issue,  little  attention  has  been  given  to  games  of  a  more  adversarial 
nature,  in  which  defecting  players,  rather  than  simply  free  riding,  actively  seek  to  harm  others.  Short, 
Brantingham  and  D’Orsogna  [152]  have  developed  a  evolutionary  game  theoretic  model  that  explores  how 
population  composed  of  different  criminal  and  reporting  strategies  evolve.  They  use  the  specific  example 
of  criminal  activity,  recasting  the  familial-  public  goods  strategies  of  punishers,  cooperators,  and  defectors 
in  this  light.  They  introduce  a  strategy  “the  informant”  with  no  clear  analog  in  public  goods  games  and 
show  that  individuals  employing  this  strategy  are  a  key  to  the  emergence  of  systems  where  cooperation 
dominates.  They  also  find  that  a  defection-dominated  regime  may  be  transitioned  to  one  that  is  cooperation- 
dominated  by  converting  an  optimal  number  of  players  into  informants.  They  discuss  these  findings,  the  role 
of  informants,  and  possible  intervention  strategies  in  extreme  adversarial  societies,  such  as  those  marred 
by  wars  and  insurgencies.  Simulations  demonstrate  that  this  idealized  society  has  two  stable  equilibrium 
points  where  the  population  consists  of  either  (1)  offenders/non-reporters  and  non-offenders/non-reporters 
(“dystopia”),  or  (2)  non-offenders/reporters  and  non-offenders/non-reporters  (“utopia”).  Informants  who 
commit  crimes  but  will  also  report  crimes  that  they  themselves  do  not  commit  are  not  present  in  either  of 
these  equilibria.  However,  simulation  and  an  equivalent  ODE  system  shows  that  informants  are  critical  to 
transitioning  between  dystopia  and  utopia.  This  work  led  to  the  formation  of  a  new  MURI  team  led  by 
Milind  Tambe  at  USC  starting  this  year-.  Current  co-PIs  Bertozzi  and  Brantingham  are  involved  with  this 
program  as  are  former  postdocs  D’Orsogna  and  Short. 

2.5.  Gang  Rivalry  Networks  -  a  Mechanistic  Approach 

PhD  students  Rachel  Hegemann  and  Laura  Smith  use  an  agent  based  model  to  investigate  social  and  physical 
geographic  influences  on  gang  rivalry  network  formation  in  the  Eastern  Los  Angeles  policing  district  of 
Hollenbeck.  The  model  includes  basic  movement  routines  based  around  known  territory  anchor  points  (‘set 
spaces’)  provided  by  the  LAPD.  Known  physical  boundaries  such  as  highways  are  encoded  into  the  model 
and  are  treated  as  semi-permeable  boundaries  by  the  agents.  Interactions  are  recorded  between  the  agents  to 
produce  simulated  rivalry  network.  These  simulated  rivalry  network  are  then  compared  against  the  known 
rivalry  network  in  Hollenbeck  as  well  as  network  produced  by  alternative  methods  such  as  geographic 
threshold  graphs.  This  work  also  involves  postdoc  Alethea  Barbaro  and  collaborators  George  Tita  and 
Shannon  Reid  at  UC  Irvine.  It  was  recently  published  in  Physica  A  [67]. 

2.6.  Gang  Territory  Development  Based  on  Graffiti  Distributions 

Barbara,  D'Orsogna  and  Chayes  study  the  problem  of  gang  territory  formation  by  simulating  an  interacting 
particle  system  on  a  lattice.  The  central  hypothesis  is  that  territory  formation  and  defense  occurs  through 
territorial  marking,  which  gangs  do  through  graffiti  tagging.  We  show  that  gang  territories  can  develop 
in  reaction  to  temporally  and  spatially  evolving  distributions  of  graffiti.  We  study  a  two-gang  model  in 
which  agents  deposit  distinct  graffiti  territorial  markers,  all  graffiti  decays  in  time,  and  agents  condition 
their  movement  patterns  in  response  to  graffiti  distribution..  Using  methods  from  statistical  mechanics, 
we  prove  a  phase  transition  occurs  in  this  system  where  random  distribution  of  gang  members  suddenly 
segregate  into  distinct  territories. 
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3.  Maximum  Penalized  Likelihood  Estimation  and  Data  Fusion 


Figure  7.3:  Residential  burglary  in  2004  for  an  18x18  km  area  of  the  San  Fernando  Valley,  Los  Angeles. 
Point  locations  of  crimes  shown  on  the  far  left.  Middle  figures  compare  density  estimation  using  MPLE  and 
TV  regularization  (left)  with  more  traditional  kernel  based  methods  (right).  Actual  housing  densities  are 
shown  on  the  far  right,  which  could  be  fused  with  the  point  process  data  of  human  activity,  to  create  a  more 
accurate  crime  density  map. 


3.1.  TV  Regularized  MPLE 

Total  Variation-based  regularization,  well  established  for  image  processing  applications  such  as  denoising, 
was  recently  introduced  for  Maximum  Penalized  Likelihood  Estimation  (MPLE)  as  an  effective  way  to 
estimate  non-smooth  probability  densities.  While  the  estimates  show  promise  for  a  variety  of  applications, 
the  non-linearity  of  the  regularization  leads  to  computational  challenges,  especially  in  multi-dimensions. 
George  Mohler,  Andrea  Bertozzi,  Tom  Goldstein  and  Stan  Osher  present  a  numerical  methodology,  based 
upon  the  Split  Bregman  LI  minimization  technique,  that  overcomes  these  challenges,  allowing  for  the  fast 
and  accurate  computation  of  2D  TV-based  MPLE  (see  Figure  7.3).  We  test  the  methodology  with  several 
examples,  including  V-fold  Cross  Validation  with  large  2D  data  sets,  and  highlight  the  application  of  TV- 
based  MPLE  to  point  process  crime  modeling.  This  work  has  been  accepted  in  J.  Computational  and 
Graphical  Statistics. 

3.2.  Improving  Density  Estimation  by  Incorporating  Spatial  Information 

The  TV  regularized  MPLE  method  described  above  can  be  improved  by  incorporating  additional  spatial 
information.  We  propose  a  set  of  Maximum  Penalized  Likelihood  Estimation  methods  based  on  Total  Vari¬ 
ation  and  HI  Sobolev  normregularizers  in  conjunction  with  a  priori  high  resolution  spatial  data  to  obtain 
more  geographically  accurate  density  estimates.We  apply  this  method  to  a  residential  burglary  data  set  of 
the  San  Fernando  Valley  using  geographic  features  obtained  from  satellite  images  of  the  region  and  housing 
density  information.  This  work  was  performed  by  Laura  Smith  and  Matthew  Keegan  as  part  of  their  PhD 
thesis  work  with  advice  from  Todd  Wittman,  George  Mohler  and  coPI  Bertozzi.  We  have  published  a  paper 
in  EURASIP  J.  on  Advances  in  Signal  Processing,  special  issue  on  Advanced  Image  Processing  for  Defense 
and  Security  Applications,  2010.  See  Figure  7.4. 

3.3.  Filling  in  Missing  Information  in  Gang  Crime 

Dynamic  activity  involving  social  networks  often  has  distinctive  temporal  patterns  that  can  be  exploited  in 
situations  involving  incomplete  information.  Even  when  activity  is  highly  stochastic,  localized  excitations  in 
parts  of  the  network  can  help  identify  actors  in  cases  of  unknown  origin.  Pinpointing  the  source  of  unknown 
activity  in  large  social  networks  is  a  combinatorially  complex  problem  that  can  be  more  easily  computed  via 
a  non-convex  constrained  optimization.  Gang-related  violent  crimes  pose  a  major  problem  for  authorities  in 
large  cities,  where  cycles  of  retaliatory  violence  can  lead  to  short  but  intense  periods  of  crime. 

The  UCLA  Institute  of  Pure  and  Applied  Mathematics  Research  in  Industrial  Projects  for  Students 
(IPAM-RIPS)  project  team  for  2010  worked  on  the  problem  of  identifying  unknown  parties  (gangs)  in  gang- 
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(a)  San  Fernando  Valley  residential  burglary  kernel 
density  estimation 


(c)  San  Fernando  Valley  residential  burglary  modified 
TV  MPLE  density  estimation 


(b)  San  Fernando  Valley  residential  burglary  TV  MPLE  density  estimation 


(d)  San  Fernando  Valley  residential  burglary  weighted  H i  MPLE  density 
estimation 


Figure  7.4:  These  images  are  the  density  estimates  for  the  San  Fernando  Valley  residential  burglary  data, 
(a)  and  (b)  show  the  results  of  the  current  methods  Kernel  Density  Estimation  and  TV  MPLE,  respectively. 
The  results  from  ourModified  TV  MPLE  method  and  ourWeighted  H 1  MPLE  method  are  shown  in  figures 
(c)  and  (d),  respectively.  The  color  scale  represents  the  number  of  residential  burglaries  per  year  per  square 
kilometer.  Figure  taken  from  [155] 
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related  violent  incidents.  Reports  from  LAPD  indicate  that  a  significant  number  of  gang-related  crimes  in 
Hollenbeck  have  unknown  perpetrators  with  unknown  gang  affiliation.  The  RIPS-LAPD  project  developed 
an  algorithm  that  LAPD  could  use  to  predict  the  possible  perpetrator’s  gang  in  cases  where  the  crime  is 
marked  as  gang-related  but  the  suspect  gang  is  not  known.  The  algorithm  uses  crimes  with  complete  infor¬ 
mation  to  fill  in  missing  data  fields  for  crimes  with  incomplete  information.  Spatio-temporal  information 
about  the  crimes  arc  used  as  well  as  information  about  gang  territories  and  historical  rivalries.  The  students 
created  prototype  algorithm  that  produces  an  accurate  prediction  of  the  gangs  involved  in  individual  crimes. 

This  problem  was  studied  in  more  detail  by  Alexey  Stomakhin,  Martin  Short,  and  Andrea  Bertozzi  in 
[159].  The  authors  considered  a  model  in  which  the  repeat  activity  between  nodes  of  the  network  is  modeled 
by  a  temporal  Hawkes  process  (Figure  1).  Here,  the  nodes  of  the  network  represent  individual  street  gangs 
and  the  activities  arc  violent  crimes  between  the  gangs,  some  of  which  are  unsolved.  The  goal  is  to  correctly 
identify  the  gang  affiliated  with  the  unsolved  crimes.  The  authors  construct  an  energy  functional  inspired 
by  the  true  probabilistic  likelihood  associated  with  the  Hawkes  process  that  depends  quadratically  on  the 
probability  that  an  unsolved  crime  was  committed  by  a  specific  gang.  They  maximize  this  functional  under 
an  12  constraint  using  gradient  flow.  This  problem  is  well-posed,  and  generally  has  a  unique  optimal 
solution.  The  algorithm  performs  almost  identically  to  a  combinatoric  approach  for  small  datasets,  but  runs 
in  a  fraction  of  the  time;  for  large  datasets,  the  combinatoric  approach  is  computationally  infeasible.  For 
arti  fie ial  datasets  with  properties  similar  to  those  of  the  Los  Angeles  gang  network,  the  algorithm  places  the 
correct  gang  within  the  top  4%  of  likelihood  approximately  80%  of  the  time,  highlighting  the  usefulness  of 
this  method. 

4.  Self-exciting  Point  Process  Models  of  Crime  and  Insurgent  Violence 

4.1.  Self-exciting  Point  Process  Modeling  of  Residential  Burglaries 

Highly  clustered  event  sequences  arc  observed  in  certain  types  of  crime  data,  such  as  burglary  and  gang 
violence,  due  to  crime  specific  patterns  of  criminal  behavior.  Similar  clustering  patterns  are  observed  by 
seismologists,  as  earthquakes  arc  well  known  to  increase  the  risk  of  subsequent  earthquakes,  or  aftershocks, 
near  the  location  of  an  initial  event.  We  have  developed  a  collaboration  with  Statistician  Frederick  Schoen¬ 
berg  at  UCLA  who  is  an  expert  on  space-time  clustering  in  seismology  as  modeled  by  self-exciting  point 
processes.  Postdocs  Mohler  and  Short,  in  collaboration  with  Brantingham  and  George  Tita  (UC  Irvine 
Criminologist)  have  developed  a  manuscript  illustrating  that  these  methods  arc  well  suited  for  crimino¬ 
logical  applications.  They  use  residential  burglary  data,  provided  by  the  Los  Angeles  Police  Department, 
to  illustrate  the  implementation  of  self-exciting  point  process  models  in  the  context  of  urban  crime.  For 
this  purpose  they  use  a  fully  non-parametric  estimation  methodology  to  gain  insight  into  the  form  of  the 
space-time  triggering  function  and  temporal  trends  in  the  background  rate  of  burglary.  This  work  has  been 
published  in  the  J.  of  the  Am.  Statistical  Assoc.  [108]. 

4.2.  Self-exciting  Point  Process  Models  for  Gang  Activity 

Gang  violence  has  plagued  the  Los  Angeles  policing  district  of  Hollenbeck  for  over  half  a  century.  With 
sophisticated  models,  police  may  better  understand  and  predict  the  region’s  frequent  gang  crimes.  During 
summer  2009  we  organized  a  summer  REU  (research  experience  for  undergraduates)  project  to  address 
whether  self-excitation  could  be  quantified  in  Hollenbeck’s  gang  rivalries.  A  self-exciting  point  process 
called  a  Hawkes  process  was  used  to  model  rivalries  over  time.  Figure  7.5  shows  computed  arrival  rate 
functions  for  the  Locke-Lowell  rivalry  determined  from  police  data  and  resulting  numerical  simulations  that 
can  be  performed  in  a  model  for  this  rivalry.  While  this  is  shown  to 

fit  the  data  well,  an  agent  based  model  is  presented  which  is  able  to  accurately  simulate  gang  rivalry 
crimes  not  only  temporally  but  also  spatially.  Finally,  the  students  compared  random  graphs  generated  by 
the  agent  model  to  existing  models  developed  to  incorporate  geography  into  random  graphs.  This  work  was 
published  in  SIAM  Undergraduate  Research  Online  [52]  by  the  team  of  students. 
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Figure  7.5:  On  top,  a  plot  of  the  Locke-Lowell  rivalry’s  crimes  over  time  with  the  respective  arrival  rate 
function  A (t).  On  bottom,  simulated  crimes  from  a  Hawkes  process  with  the  Locke-Lowell  rivalry’s  param¬ 
eters  and  the  corresponding  rate  function,  A(t).  Figure  from  [52]. 


In  summer  2010  Kym  Louie,  Mark  Allenby  and  Marina  Masaki  comprised  an  undergraduate  team  men¬ 
tored  by  Tim  Lucas  (Pepperdine)  onsite  at  UCLA  expanding  on  the  work  described  above  to  include  both 
spatial  dependence  and  directionality  of  the  rivalry  behavior.  The  group  presented  a  model  to  simulate 
directed  crimes  in  the  33  gang  system  of  Hollenbeck  [6], 

4.3.  Self-exciting  Point  Process  Models  for  Insurgent  Activity 

We  recently  purchased  a  copy  of  the  Iraq  Body  Count  Data,  compiled  by  an  organization  dedicated  to 
accurately  recording  all  civilian  deaths  in  Iraq  [35].  The  number  of  fatalities  linked  to  any  event  is  not 
an  estimate  by  the  organization,  but  a  count  corroborated  by  at  least  two  reliable  news  sources.  In  the 
data  we  consider,  from  March  20,  2003  to  December  31,  2007,  there  arc  15,977  events.  Each  entry  in  the 
data  contains  a  start  date,  end  date,  minimum  number  of  deaths,  maximum  number  of  deaths,  town  and 
possibly  a  district  of  where  the  event  occurs.  Our  goal  in  this  paper  is  to  analyze  temporal  patterns  of 
civilian  death  reports.  For  this  purpose  we  employ  a  branching  point  process  model  similar  to  those  used 
in  earthquake  analysis.  Here  the  rate  of  events  is  partitioned  into  the  sum  of  a  Poisson  background  rate  and 
a  self-exciting  component  in  which  events  trigger  an  increase  in  the  rate  of  the  process.  More  specifically, 
each  event  generated  by  the  process  in  turn  generates  a  sequence  of  offspring  events  according  to  a  Poisson 
distribution.  Whereas  the  background  rate  is  typically  assumed  to  be  stationary  for  seismic  activity,  such 
an  assumption  is  not  valid  in  the  context  of  civilian  deaths  in  Iraq.  We  propose  three  simple  adjustments  to 
account  for  background  rate  variation  and  compare  the  effectiveness  of  each  model  using  Iraq  Body  Count 
data  from  2003  to  2007.  Our  results  indicate  that  branching  point  processes  are  well  suited  for  modeling  the 
temporal  dynamics  of  violence  in  Iraq.  This  work  was  performed  by  PhD  student  Erik  Lewis  with  help  from 
George  Mohler  and  Andrea  Bertozzi.  Jeff  Brantingham  obtained  the  data  and  helped  with  interpretation  of 
the  data  by  the  mathematics.  The  work  has  just  been  published  in  Security  Journal  [86]. 
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Chapter  8 

Theoretical  Results  in  Quickest 
Changepoint  Detection 


This  chapter  is  intended  to  summarize  our  contributions  in  the  theory  of  changepoint  detection  made  with 
the  support  of  this  grant.  The  work  has  been  performed  by  the  group  of  Dr.  Tartakovsky. 

1.  The  General  Problem  and  Preliminaries 

Throughout  this  chapter  we  will  focus  on  the  basic  iid  setting  of  the  changepoint  detection  problem.  The  set¬ 
ting  assumes  that  one  is  able  to  sequentially  gather  a  series  of  independent  random  observations,  {Xn}n^i. 
The  observations  arc  such  that  X \ ,  X2 ,  ■  •  • ,  Xv  arc  each  distributed  according  to  a  known  density  /,  and 
Xu+\,  Xl/+2-  ■  ■  ■  each  adhere  to  a  density  g  ^  /,  also  known.  The  time  instant  0  A  v  <  oc  is  referred  to  as 
the  changepoint,  and  is  assumed  unknown ;  henceforth,  v  =  00  will  mean  that  all  Xn’s  have  density  /,  and 
v  =  0  -  that  all  Xn' s  have  density  g.  The  objective  is  to  detect  as  quickly  as  possible  that  the  change  is  in  ef¬ 
fect,  subject  to  a  constraint  on  the  risk  of  sounding  a  false  alarm.  A  sequential  detection  procedure  is  defined 
as  a  stopping  time  T  (defined  with  respect  to  the  observed  data),  so  that  after  observing  A'  1 ,  AT .... .  Xj  it 
is  declared  that  a  change  may  be  in  effect. 

We  will  be  interested  in  the  following  detection  procedures.  First,  Page’s  [119]  Cumulative  SUM 
(CUSUM)  chart.  It  is  based  on  maximizing  the  likelihood  ratio  (LR),  and  can  be  defined  as  the  stopping  time 
CA  =  inf{n  ^  1:  Vn  ^  A},  where  Vn  =  max{l,  Un_i}  An,  n  ^  1,  with  V0  =  1,  An  =  g(Xn)/f(Xn) 
is  the  LR  for  the  n-th  data  point,  and  A  >  0  is  a  detection  threshold,  which  determines  the  procedure’s 
operating  characteristics;  hereafter  in  every  definition  of  a  detection  procedure  it  will  be  assumed  that 
inf{0}  =  00. 

Next,  the  Shiryaev-Roberts  (SR)  procedure.  This  procedure  is  due  to  the  independent  work  of  Shiryaev 
[146,  148],  who  considered  the  problem  of  detecting  a  change  in  the  drift  of  Brownian  motion,  and  Roberts 
[136],  who  studied  detecting  a  change  in  the  mean  of  an  iid  Gaussian  sequence.  The  SR  procedure  is  defined 
by  the  stopping  time 


<Sa  =  inf{n  ^  1 :  Rn  ^  -4}, 


(8.1) 


where  A  >  0  and 


Rn  =  (1  +  Rn- 1)  An,  n  ^  1  with  R0  =  0.  (8.2) 

Poliak  [123]  proposed  to  tweak  the  SR  procedure  by  starting  it  off  a  random  point,  R® ,  sampled 
from  { Rn \-iigo 's  quasi-stationary  distribution.  The  cdf  of  this  distribution,  Qa(x),  is  defined  as  Qa(x )  = 
limn_Kx>  IP  oc  (Rn  A  X  |  <S/t  A  II )  ■ 
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Poliak’s  [123]  tweaked  version  of  the  SR  procedure,  known  as  the  Shiryaev-Roberts-Pollak  (SRP) 
procedure,  is  defined  by  the  stopping  time 

S%  =  inf{n  ^  1:  >  A},  (8.3) 

where 

Rn+ 1  =  (!  +  R%)  An+i,  n  ^  0,  with  oc  Qa(x).  (8.4) 

The  performance  of  a  detection  procedure  is  judged  based  on  the  desired  optimality  criteria.  We  will  be 
interested  in  two  described  below. 

Henceforth,  let  P^(-)  and  Poo(-)  be  the  probability  measures  generated  by  the  data  {Xn}n^i,  respec¬ 
tively,  when  the  changepoint  is  v  =  k,  0  ^  k  <  oo,  and  v  =  oc.  Let  E /,.[■]  and  Eoo[-]  denote  the  correspond¬ 
ing  expectations. 

For  a  generic  detection  rule,  T,  the  risk  of  sounding  a  false  alarm  is  measured  via  the  Average  Run 
Length  (ARL)  to  false  alarm  defined  as  ARL(T)  =  E^fT];  see  Lorden  [95].  Let 

A(7)  =  {T:  ARL(T)^7}  (8.5) 

be  the  class  of  procedures  for  which  the  ARL  to  false  alarm  is  no  less  than  the  desired  a  priori  set  level 
7  >  1. 

The  first  optimality  criterion  we  will  be  interested  in  is  due  to  Poliak  [123],  and  it  seeks  to  find  Topt  G 
A(7)  such  that  SADD(Top/)  =  infTe^(7)  SADD(T),  for  every  7  >  1,  where  hereafter 


SADD(T)  =  sup  ADD„(T),  and  ADD„(T)  =  E„[T  -  v\T  >  v\.  (8.6) 

0^<oc 


To  date,  no  solution  to  this  problem  has  been  found.  Alternatively,  three  types  of  asymptotic  optimality 
arc  distinguished. 

Definition  1.1.  A  procedure  T*pt  G  A(7)  is  order-1  asymptotically  optimal  if 


lim 

7— >-00 


SADD(T^) 
inf TgA(7)  SADD(T) 


=  1,  he., 


inf 

T  6A(7) 


SADD(T)  =  SADD(T*pt)[l  + 


o(l)], 


as  7  — >  00, 


where  hereafter  o(l)  — >  0,  as  7  —>•  oc. 

A  procedure  T*pt  G  A(7)  is  order-2  asymptotically  optimal  if  SADD(T*pt)  —  infTe^(7)  SADD(T)  = 
0(1),  as  7  — >  00,  where  hereafter  0(1)  is  bounded,  as  7  — >  00. 

A  procedure  T*pt  G  A(7)  is  order-3  asymptotically  optimal  if  SADD(T*pt)  —  infTeA(7)  SADD(T)  = 
o(l),  as  7  — >  00. 


We  now  describe  the  second  optimality  criterion  we  will  consider.  The  criterion  is  known  as  “multi- 
cyclic  disorder  detection  in  a  stationary  regime”.  Let  T),  T2, . . .  denote  sequential  independent  applications 
of  the  same  stopping  time  T,  and  let  7j;)  =  7}  1  +  TA)  +  •  •  •  +  Tl/}  be  the  time  of  the  j-th  alarm,  j  E  1. 
Let  Iv  =  min{j  E  1  :  '71  n  >  0}  so  that  7)/i7  is  the  point  of  detection  of  the  true  change,  which  occurs  at 
time  instant  v  after  7^  —  1  false  alarms  have  been  raised.  Consider  STADD(T)  =  lim^oc  E u\T^v\  —  u], 
i.e.,  the  limiting  value  of  the  ADD  that  we  will  refer  to  as  the  stationary  ADD  (STADD).  The  multi-cyclic 
optimality  criterion  consists  in  finding  Topt  G  A(7)  such  that  STADD(Top/J  =  infreAf-/)  STADD(T)  for 
every  7  >  1.  For  the  basic  iid  version  of  the  changepoint  detection  problem,  Poliak  and  Tartakovsky  [125], 
showed  the  optimum  to  be  the  SR  procedure. 

The  rest  of  the  chapter  is  devoted  to  providing  a  summary  of  our  accomplishments  made  over  the  course 
of  this  project  for  the  aforestated  problem. 
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2.  Efficient  Performance  Evaluation  for  a  Class  of  Detection  Procedures 

This  section  is  a  summary  of  the  work  of  Moustakides,  Polunchenko,  and  Tartakovsky  [113],  namely,  the 
part  concerned  with  the  problem  of  efficient  performance  evaluation  for  a  class  of  detection  schemes.  The 
class  is  all  stopping  rules  whose  detection  statistic  is  a  Markov  process;  in  particular,  the  SR  rule,  the  SRP 
procedure  and  the  CUSUM  chart  all  belong  to  this  class.  We  proposed  a  numerical  framework  whereby 
one  can  evaluate  the  performance  of  virtually  any  detection  procedure  and  with  respect  to  any  performance 
index.  Additionally,  the  framework  supplies  a  concise  numerical  method  for  computing  the  quasi-stationary 
distribution,  thus  making  the  SRP  scheme  applicable  in  practice.  This  framework  can  be  of  interest  to  many 
scientists  in  various  disciplines  where  there  is  need  for  an  on-line  detection  of  changes  (or  anomalies)  in 
observed  processes. 

Note  that  though  we  have  confined  ourselves  to  the  case  of  deterministic  unknown  changepoint  v,  there 
is  also  another,  Bayesian  point  of  view,  which  assumes  that  v  is  a  random  variable  with  a  certain  prior 
distribution.  The  methodology  of  Moustakides,  Polunchenko,  and  Tartakovsky  [113]  can  be  (and  was) 
extended  to  the  Bayesian  context  as  well;  see  Tartakovsky  and  Moustakides  [168],  Tartakovsky,  Poliak,  and 
Polunchenko  [175],  and  Polunchenko  and  Tartakovsky  [129]. 

Consider  a  generic  detection  procedure  described  by  the  stopping  time  T|  =  inf{n  A  1:  V£  A  A }, 
where  A  >  0  is  the  detection  threshold,  and  {V^ln^o  is  a  generic  detection  statistic  generated  recursively 
as  V”  =  An,  n  A  1  with  V0V  =  v  ^  0.  Here  £(.x)  is  a  known  sufficiently  smooth  function  such  that 

£(x)  >  0  for  any  x  €  [0,  A),  and  v  is  a  fixed  parameter,  referred  to  as  the  stalling  point,  or  the  head  start. 

Note  that  T}  can  be  turned  into  the  CUSUM  chart  by  setting  £(x)  =  max{  I ,  a;},  and  similarly,  the 
choice  £(x)  =  1  +  x  will  “do  the  trick”  for  any  SR-type  rule.  Hence,  one  can  evaluate  any  Operating 
Characteristic  (OC)  of  any  procedure  that  is  a  special  case  of  T}  simply  by  choosing  the  right  £(.x). 

Let  A  >  0  and  v  £  [0,A)  be  fixed,  and  define  <j>d{v)  =  E d[T^\  and  Pd(t)  =  Pd(Ai  A  t),  where 
d  =  {0,  oo};  clearly,  </>oo (r)  =  ARL(T})  and  4>q{v)  =  ADDo(T}).  Let 

=  |,W+.  <  v\K  =  *)  =  (^y) .  i  =  {<W 

denote  the  transition  probability  density  kernel  for  the  homogeneous  Markov  process  {U/}nyi;  note  that 
both  ICd{x,  y),  d  =  {0,  oo}  depend  on  £(x). 

We  now  proceed  to  stating  the  equations.  First,  it  can  be  shown  that 

<f>d(v)  =  l+[  JCd(v,y)(f)d(y)dy;  (8.7) 

Jo 

cf.  Moustakides,  Polunchenko,  and  Tartakovsky  [113]. 

Next,  consider  ADD^(T^)  =  E —  v\ T}  >  v\  for  an  arbitrary  fixed  v  ^  I ;  note  that  ADDo(T’j)  = 
4>o(v)  for  all  v.  To  evaluate  AD  DAT}),  Moustakides,  Polunchenko,  and  Tartakovsky  [113]  first  argue 
that,  since  at  time  instance  v  A  1  no  change  is  yet  in  effect  and  each  observation,  Xn,  1  A  n  A  L 
still  /-distributed,  it  must  be  that  PV(T}  >  u)  =  P00(T}  >  v)  for  all  x.  Consequently,  ADD V(T})  = 
Ej ,[(T\  —  v)+]  /Pqo (T}  >  v),  and  therefore  we  ai-e  to  turn  attention  to  5„(v)  =  E U[{T\  —  v)+]  and 
Pu{v)  =  Poo (22  >  u).  For  either,  it  is  direct  to  see  that 

I*  A  pA 

5u(v)  =  /  ICOQ(v,y)Su-i(y)dy  and  pv(v)  =  /  IC^v,  y)  pu-i(y)  dy, 

Jo  Jo 

where  v  A  U  $o(v)  =  (f>o(v)  is  as  in  (8.7),  and  po(v)  =  1  for  all  v,  since  Poo(T}  >  0)  =  1  for  all 
x\  cf.  Moustakides,  Polunchenko,  and  Tartakovsky  [113].  As  soon  as  dt/(v)  and  pv(v)  arc  found,  by  the 
above  argument  ADD^fT})  can  be  evaluated  as  the  ratio  5u(x) / pu(x).  Furthermore,  using  ADDiy(T})’s 
computed  for  sufficiently  many  successive  z/s  beginning  from  u  =  0  and  higher,  one  can  also  evaluate 
SADD(T})  =  sup0^I/<oo  ADDl/(T}),  since  ADDoo(T})  =  lirn^oo  ADDy(T|)  is  independent  of  the 
starting  point,  Vq  =  v. 
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Yet  another  performance  measure  of  much  interest  is  the  local  (conditional)  false  alarm  probability 
Poo  (T  ^  k  +  m\T  >  k  ^  0  inside  a  fixed  “window”  of  size  m  ^  1  (rn  =  1  represents  the  probability 
of  an  instantaneous  false  alarm).  In  particular',  supfc>0  Poo(T  A  k  +  m\T  >  k)  can  serve  as  an  alternative 
to  the  ARL  to  false  alarm.  See  Tartakovsky  [165].  One  can  readily  conclude  that  Poo  (T%  <  k  +  m\T ,VA  > 
k)  =  1  ~  Pk+m(v)/pk(v). 

We  now  proceed  to  STADD(TJj),  i.e,  to 


STADD  (Tj() 


and  if  we  let  ip(v)  =  ~  ^)+]  =  YA=o^(x)’  ^ien  STADD^jj)  =  V>( v)/£(v ).  It  can  be 

shown  that 


fp(x)  =  S0(x)  +  [  IC00(x,y)ip(y)dy,  (8.8) 

Jo 


cf.  Moustakides,  Polunchenko,  and  Tai'takovsky  [1 13].  We  also  note  that  k'(v)  cannot  be  computed  prior  to 
d{)(v)  =  ipo(v)  as  the  former  depends  on  the  latter. 

Consider  now  randomizing  the  starting  point,  Vg  =  v,  in  a  fashion  analogous  that  behind  the  SRP 
stopping  time.  Let  Qa(x)  =  limn_5.oc  P00(V]['  p  xjTjj  >  n)  be  the  cdf  of  the  corresponding  quasi¬ 
stationary  distribution;  note  that  this  distribution  exists,  as  guaranteed  by  Harris  [66,  Theorem  III.  10.1]. 

Let  =  inf{n  ^  1:  V$  ^  A},  where  A  >  0  and  V®  =  ^(V^_1)An,  n  ^  1  with  Vg  oc  Qa, 
and  £(x)  and  Qa{x)  are  as  defined  above;  note  that  can  be  turned  into  the  SRP  procedure  by  setting 
£{x)  =  1  +  x. 

For  T(g  ,  any  OC  is  dependent  upon  the  quasi-stationary  distribution.  We  therefore  first  state  the  equation 
that  determines  qA(x)  =  cIQa(x) /dx,  the  quasi-stationary  pdf;  the  equation  can  be  seen  to  be 


A  rA 

qA{x)K,00(x,y)  dx,  subject  to  /  qA(x)  dx  =  1;  (8.9) 

Jo 

cf.  Moustakides,  Polunchenko,  and  Tartakovsky  [113]  and  Poliak  [123].  We  note  that  qA(x)  and  A .4  are 
both  unique.  Once  qA{x )  and  A  a  are  found,  one  can  compute  <f> ^  =  ARL(T® ),  and  5  =  ADDo(T^  )  = 
ADD U(T®),  v  P}  I ,  i.e.,  for  T(\ .  the  detection  delay  is  independent  from  the  changepoint.  We  have 


A AqA{y )  = 

Jo 


0oo  =  1/(1  -  Aa)  and  S  =  /  50(x)  qA(x)  dx; 

Jo 

cf.  Moustakides,  Polunchenko,  and  Tartakovsky  [113]. 

The  second  equality  in  the  above  formula  for  is  due  to  the  fact  that,  by  design,  the  P^ -distribution 
of  the  discrete  random  variable  Tg  is  exactly  geometric  with  parameter  1  —  A4;  note  that  0  <  A4  <  1.  Put 
otherwise,  Poc(T^  >  u)  =  A(j,  where  u  P  0;  in  general,  limj4_5.0O  A4  =  1.  Also,  Poliak  and  Tai'takovsky 
[127]  provide  sufficient  conditions  for  A4  to  be  an  increasing  function  of  A;  in  particular,  they  show  that  if 
the  cdf  of  log  Ai  under  measure  P.^  is  concave,  A4  is  increasing  in  A. 

To  conclude,  we  have  now  obtained  a  set  of  (exact)  integral  equations  and  relations  governing  all  com¬ 
monly  used  performance  measures  (OC-s)  for  a  broad  spectrum  of  detection  procedures;  equations  (8.7),  (8.8), 
and  (8.9)  comprise  the  cadre  of  the  set.  These  equations  are  Fredholm  (linear)  integral  equation  of  the  sec¬ 
ond  kind,  which  are  known  to  rarely  permit  for  an  analytical  solution.  Hence,  to  deal  with  the  corresponding 
integral  equations  a  numerical  solver  may  be  in  order.  One  offered  by  Moustakides,  Polunchenko,  and 
Tai'takovsky  [113]  is  a  piecewise-constant  (zero-order  polynomial)  collocation  method  with  the  interval  of 
integration  [0,  A]  partitioned  into  N  ^  1  equally  long  subintervals.  The  collocation  nodes  are  the  subin¬ 
tervals'  middle  points.  As  the  simplest  case  of  the  piecewise  collocation  method  (see,  e.g.,  Atkinson  and 
Han  [9]),  the  question  of  accuracy  is  a  well-understood  one,  and  tight  error  bounds  can  be  easily  obtained 
from,  e.g.,  Atkinson  and  Han  [9].  Specifically,  it  can  be  shown  that  the  uniform  norm  of  the  difference 
between  the  exact  solution  and  the  approximate  one  is  0(1/N),  provided  N  is  sufficiently  large. 
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3.  The  Shiryaev-Roberts-r  Procedure 

Since  its  inception  in  1985,  the  SRP  procedure  has  been  (not  unfoundedly)  believed  to  possibly  be  exactly 
optimal  with  respect  to  SADD(T)  in  the  class  A(7).  Although  many  have  tried  to  prove  this  hypothesis 
to  be  true,  none  have  succeeded.  We  show  this  hypothesis  to  be  false.  This  conclusion  is  archived  by 
proposing  an  extension  of  the  SR  rule  competitive  to  the  SRP  procedure,  and  performing  direct  performance 
comparison  of  the  twain.  The  idea  of  the  new  procedure  is  to  let  the  SR  detection  statistic  Rn  to  start  from 
a  fixed  deterministic  point  R$  =  r  ^  0.  We  examined  the  performance  of  the  resulting  SR-r  scheme 
in  relation  to  the  starting  point,  and  proposed  an  initializing  value  for  which  the  SRP  rule  is  uniformly 
worse.  This  was  demonstrated  both  numerically  and  analytically,  i.e.,  by  virtue  of  a  counterexample.  We 
also  suggest  a  starting  point  for  which  the  SR-r  scheme  exhibits  a  faster  initial  response  to  early  changes. 
For  details  see  Moustakides,  Polunchenko,  and  Tartakovsky  [113],  Polunchenko  and  Tartakovsky  [128], 
and  Tartakovsky  and  Polunchenko  [169]. 

We  first  introduce  the  SR-r  procedure.  It  is  defined  as  the  stopping  time  S'A  =  inf{n  ^  1 :  R'n  f  A}, 
where  A  >  0  and 


Rrn  =  (1  +  R-n-i )  A n,  n  ^  1  with  Rq  =  r  ^  0. 


(8.10) 


The  extra  “r”  in  the  name  is  to  emphasize  the  importance  of  the  starting  point  (head  start).  The  question 
is  now:  Can  one  design  the  head  start  so  as  to  obtain  a  procedure,  capable  of  competing  with  Poliak’s  SRP 
procedure?  The  answer  is  “yes”,  which  we  will  explain  in  the  remainder  of  this  section. 

First,  recall  that  the  direct  way  to  assess  the  quality  of  a  detection  procedure  is  to  compare  it  against 
the  exact  optimum.  However,  no  exactly  SADD(T)-optimal  procedure  has  yet  been  proposed.  Hence,  an 
alternative  approach  is  in  order.  As  a  point  of  reference  one  could  use  a  lower  bound  on  the  (unknown) 
optimum.  The  following  theorem  shows  that  finding  such  a  bound  is  a  much  easier  task  than  actually 
designing  the  (exactly)  optimal  test. 


Theorem  3.1.  Consider  SrA,  and  let  A  =  A7  be  selected  so  that  ARL(5^  )  =  7.  Then  inf^AA)  SADD(T)  ^ 
SADD (S a)  for  every  r  ^  0,  where 


SADD(S^) 


rADD0(^)  +  Er=o^[(^-k)+] 
r  +  ARL(5^) 


Let  us  now  fix  threshold  A  >  0,  and  propose  the  following  stalling  point 


(8.11) 


ta 


arg  inf  j SADD (5^)  -  SADD(cSr4)| 

07r<A 


(8.12) 


as  a  possible  candidate  for  initialization  of  the  SR-r  scheme.  In  other  words,  we  select  the  value  that 
brings  the  two  bounds  (upper  and  lower)  as  close  to  each  other  as  possible.  It  can  be  seen  that  the  resulting 
stopping  time,  S'f ,  is  a  function  of  A  >  0  only,  which  is  set  so  that  the  false  alarm  constraint  is  satisfied  with 
equality.  In  the  next  subsection  we  offer  a  numerical  study  of  the  performance  of  the  SR-r  procedure  for 
various  values  of  r  using  the  numerical  framework  of  Moustakides,  Polunchenko,  and  Tartakovsky  [113]. 

3.1.  An  Example:  Gaussian  Scenario 

Let  {Xn}nA>i  be  independent  unit-variance  Gaussian.  Specifically,  assume  X2,  .  .  ■ ,  Xv  are  each  AA(0, 1) 
and  Xv+i,  Xv+2,  ■  ■  ■  are  each  AT (8, 1),  where  8  /  0,  a  known  constant. 

Apart  from  the  initialization  strategies  introduced  above,  namely  the  classical  SR  test  (with  r  =  0), 
the  SRP  test,  the  SR -74  and  the  SR-r*  procedures,  where  r*  is  to  the  smallest  r  for  which  ADDJ/(5^) 
an  increasing  function  of  v,  we  will  also  examine  the  case  of  the  SR-//,  where  r  =  //.  the  mean  of  the 
quasi-stationary  distribution. 

Let  6  =  0.1  (which  corresponds  to  a  relatively  faint,  not  easily  detectable  change),  and  consider  two 
cases:  7  =  103  and  7  =  104,  i.e.,  moderate  and  low  risk  of  sounding  a  false  alarm.  This  translates  into  the 
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detection  threshold,  A,  being  in  the  range  103  ±  1%  and  104  ±  1%,  respectively.  To  solve  the  corresponding 
integral  equations  and  obtain  the  desired  OC-s,  we  partition  the  interval  [0,  A)  into  N  =  104  (for  7  =  103) 
and  N  =  105  (for  7  =  104)  equidistant  nodes.  This  is  sufficient  to  provide  the  accuracy  of  0.5%  (confirmed 
by  Monte  Carlo  simulations  with  106  repetitions). 

Before  proceeding  with  the  presentation  of  our  computational  results,  we  would  like  to  mention  that 
in  order  to  evaluate  the  ARL  to  false  alarm  of  the  SR-r  and  SRP  procedures,  it  is  important  to  have  a 
fairly  accurate  initial  guess,  i.e.,  to  obtain  a  pilot  estimate  of  ARL(5^)  to  search  for  appropriate  threshold 
values  in  a  relatively  narrow  interval.  To  this  end,  the  following  approximation  ARL )  =  (A  —  r) /Q  is 
used,  where  the  constant  v  E  (0, 1)  (related  to  the  “overshoot”)  is  the  subject  of  renewal  theory  and  can  be 
computed  numerically.  This  approximation  can  be  obtained  by  noticing  that  Rrn  —  n  —  r  is  a  -martingale 
with  zero  expectation,  so  that  by  the  optional  sampling  theorem  we  have  Eoo  [i?£r  —  S'A  —  r]  =  0.  Hence 
ARL(5jj)  =  Eoo[f?^r]  —  r,  and,  since  is  the  first  excess  over  A,  renewal  theory  can  be  applied  to 
the  “overshoot”  log(f?£r)  —  log  A.  This  approximation  was  first  derived  for  r  =  0  in  Poliak  [124],  and 
its  generalization  for  any  r  E  [0,  A)  is  straightforward.  For  the  SRP  procedure  the  value  of  r  should  be 
replaced  by  fi  =  W,oc\I\><A],  the  mean  of  the  quasi-stationary  distribution. 


Figure  8.1:  ADDj,(5jj)  for  different  procedures  as  a  function  of  the  changepoint  u  for  6  =  0.1. 

Shown  in  Figure  8.1(a)  is  the  family  of  curves  ADDly(5j1)  versus  v  for  all  initialization  procedures  in 
question  when  6  =  0.1  and  ARL  to  false  alarm  7  =  103.  Figure  8.1(b)  shows  the  same  for  7  =  104. 
Table  8. 1  reports  the  numerical  values  obtained  by  our  computational  method  for  characteristic  values  of  the 
change  time  and  for  the  case  of  ARL  to  false  alarm  equal  to  103. 


Table  8.1:  ADD„(5^)  versus  v  for  7  =  103  and  6  =  0.1. 


Procedure  \u 

0 

50 

100 

200 

400 

600 

800 

1000 

SR 

298.5 

258.3 

230.2 

197.7 

182.9 

181.5 

181.4 

181.4 

SR-r  4 

202.8 

195.9 

196.4 

200.1 

202.5 

202.8 

202.8 

202.8 

SR-r* 

174.9 

179.9 

191.6 

205.6 

213.1 

214.1 

214.2 

214.3 

SR -n 

194.0 

190.7 

194.6 

201.6 

205.6 

206.0 

206.1 

206.1 

SRP 

206.1 

As  expected,  the  SRP  procedure  is  an  equalizer.  The  SR-r*  test  has  the  fastest  initial  response  (for 
immediate  and  early  changes),  but  the  worst  minimax  behavior.  The  SR-r'4  procedure  is  uniformly  better 
than  all  competing  strategies  including  the  SRP  test.  In  the  latter  comparison,  even  though  the  difference 
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is  not  dramatic,  it  is  visible.  It  is  interesting  to  note  that  the  SR-p  rule  has  an  intermediate  performance 
between  SR-/\i  and  SR-?’*,  namely  sufficiently  fast  initial  response  and  a  minimax  performance  attained  at 
the  steady  state  which  is  the  same  as  the  SRP  test. 

Regarding  the  conventional  SR  test  (with  r  =  0)  note  that  it  outperforms  all  its  competing  schemes, 
including  SRP  for  sufficiently  large  change-time  v  =  k.  This  is  expected  since,  as  we  can  see  from  Fig¬ 
ure  8.2(b),  when  all  tests  have  the  same  threshold  the  SR  test  has  the  largest  ARL  to  false  alarm  and  the 
same  steady  state  value  for  the  expected  detection  delay.  For  the  other  tests,  in  order  to  attain  the  same  as 
the  SR  test  ARL  to  false  alarm,  the  thresholds  should  be  increased.  This  will  result  in  an  increase  in  the 
expected  detection  delay  and  in  particular  the  corresponding  steady  state  value.  Consequently,  the  expected 
delay  of  SR,  due  to  its  monotone  behavior,  will  attain  smaller  values  than  the  other  tests  for  sufficiently  large 
change-time  u  =  k. 

To  sum  up,  the  best  (in  the  minimax  sense  with  Poliak’s  measure  of  detection  delay)  performance  is 
delivered  by  the  SR-r  procedure.  By  design,  performance-wise  this  rule  is  very  close  to  the  lower  bound 
SADD(T)  given  by  (8.11).  This  suggests  that  the  unknown  exactly  SADD(T)-optimal  procedure  can  offer 
only  a  practically  insignificant  improvement  over  the  SR-r  rule.  Furthermore,  the  example  considered  in 
the  next  section  indicates  that  the  SR-r  procedure  may,  in  fact,  be  the  sought  optimum. 

3.2.  Exact  Optimality  of  the  SR-r  Procedure 

This  section  constructs  an  analytical  counterexample,  which  supplies  a  decisive  negative  answer  to  the 
question  of  possible  minimax  optimality  of  the  SRP  procedure. 

Theorem  3.2.  Let  f(x)  =  e~xtix^0y  andg(x)  =  2e~2xt^x^,0j.  Assume  the  SR-r  starts  off  r a  =  \/l  +  A— 
1,  where  A  solves  the  transcendental  equation 

A  +  (7  -  l)v/lTAlog(l  +  A)  -  2(7  -  1)VTT A  =  0. 

Then,  for  every  1  <  7  <  70  =  (1  —  0.5  log 3)_1  «  2.2188,  ARL(5^A)  =  7  and  the  SR-r  procedure  is 
minimax,  i.e.,  SADD(5^A)  =  inf 7-^(7)  SADD (T)  for  every  7  e  (l,7o)- 

Let  the  threshold  in  the  SRP  procedure  be  chosen  as  A*  =  exp{2(7  —  1) / 7}  —  1.  Then  ARL(c>  4*)  =  7 
and  SADD  (5^* )  >  SADD (S'ff)  for  all  1  <  7  <  70.  Therefore,  the  SRP  procedure  is  suboptimal. 

For  another  analogous  result  see  Tartakovsky  and  Polunchenko  [169]. 

4.  Asymptotic  Optimality  Properties  of  the  Generalized  Shiryaev-Roberts  Procedures 

This  section  is  a  logical  continuation  of  the  earlier  work  on  the  SR-r  procedure.  For  an  extended  version 
of  the  material  presented  in  this  section  see  Tartakovsky,  Poliak,  and  Polunchenko  [175].  Specifically,  our 
intent  is  to  gain  a  theoretical  insight  into  how  the  SR  rule,  the  SR-r  procedure,  and  the  SRP  scheme  compare 
against  one  another  performance-wise.  Specifically,  we  ask  and  answer  the  following  questions: 

1.  Is  the  stationary  expected  delay  of  the  repeated  SR  procedure  similar  to  limjy_5.00  ADD,,  (5.4)?  (Yes, 
see  Theorem  4.2,  Theorem  4.3  and  Corollary  4.1.) 

2.  What  can  be  said  about  the  maximal  expected  detection  delays  of  the  SR,  SR-r,  and  SRP  procedures? 
(The  SRP  procedure  and  the  SR-r  procedure  with  a  specially  designed  r  arc  third-order  asymptoti¬ 
cally  minimax,  i.e.,  to  within  a  negligible  term  o(l)  — >  0.  See  Theorem  4.4.  This  answer  justifies  the 
conjecture  of  Moustakides,  Polunchenko,  and  Tartakovsky  [113].) 

3.  What  can  be  said  about  lim,,-^  ADD„(5a),  lim^oo  ADD^iS^j),  and  lim,,-^  ADD^(5^)  when  all 
have  the  same  ARL  to  false  alarm  7  ^  I  ?  (The  ADD  at  infinity  is  the  smallest  for  the  original  SR 
procedure,  5 4,  but  the  difference  between  them  is  negligible  as  7  — >•  00.  See  Theorems  4.5  and  4.4.) 

We  will  focus  on  SADD(T)  =  sup0^„<oo  ADD„(T)  and  ADD00(T)  =  lim^oo  ADD„(T).  We  recall 
that  ADD„(T)  =  E V\T  —  v\T  >  v\,  v  ^  0. 
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It  follows  from  Poliak  [123]  that  the  SRP  procedure  (8.3)  is  order-3  asymptotically  optimal  whenever 
Eo[|log  Ai|]  <  oo.  We  offer  a  proof  of  the  order-3  asymptotic  optimality  property  under  the  stronger  second 
moment  condition  Eo[|log  Ai|2]  <  oo  and  using  different  techniques.  The  second  moment  condition  allows 
one  to  obtain  higher-order  asymptotic  approximations  for  SADD(5^)  and  inf^AM  SADD(T)  (up  to  a 
vanishing  term). 

More  importantly,  using  the  ideas  of  Mous- 
takides,  Polunchenko,  and  Tartakovsky  [113] 
one  is  able  to  design  the  initialization  point 
r  =  r(l)  in  the  SR-r  procedure  so  that  it 
is  also  order-3  asymptotically  optimal.  In 
this  respect,  ADDoc(5^)  plays  a  critical  role. 

To  understand  why,  let  us  look  at  Figure  8.2 
which  shows  AD DU(SA)  versus  v  for  several 
initialization  values  r.  This  figure  was  ob¬ 
tained  using  integral  equations  and  numerical 
techniques  developed  by  Moustakides,  Pol¬ 
unchenko,  and  Tartakovsky  [113].  If  r  =  0, 
this  is  the  classical  SR  procedure  whose  aver¬ 
age  detection  delay  is  monotonically  decreas¬ 
ing  to  its  minimum  that  is  attained  at  infinity  (a 
steady  state  value).  It  is  seen  that  there  exists 
a  value  r  =  r*A  that  depends  on  the  thresh¬ 
old  A  for  which  the  worst  point  v  is  at  infinity, 

i.e.,  SADD(S?)  =  ADDoo(5?).  The  value  Fi§ure  8‘2:  TyPical  behavior  of  ADD-(T)  as  a  function  of 
e  *  ■  ,,  •  ,  ,  r  ,  •  ,  •  ,  changepoint  v  for  various  initialization  strategies, 

of  rA  is  the  minimal  value  for  which  this  hap-  1  b 

pens  and  it  is  also  the  value  that  delivers  the 

V* 

minimum  to  the  difference  between  SADDfiS  j' )  and  the  lower  bound  for  inf TeA(7)  SADD(T)  derived 
by  Moustakides,  Polunchenko,  and  Tartakovsky  [113]  and  Polunchenko  and  Tartakovsky  [128].  This  is  a 
very  important  observation,  since  it  allows  us  to  build  a  proof  of  asymptotic  optimality  based  on  an  estimate 
ofADDooOS^). 

The  monotonicity  of  the  curve  for  the  ADD  of  the  SR  procedure  allows  us  also  to  conclude  (intuitively 
only  since  this  is  only  a  numerical  observation  and  there  is  no  theoretical  justification  of  monotonicity)  that 
the  asymptotic  lower  bound  for  infTgA(7)  SADD(T)  can  be  evaluated  based  on  the  value  of  ADDoo  (5a). 

Asymptotically,  ADDofA^),  ADD00(5^4),  and  ADDqo(5a)  are  the  same,  since  the  mean  of  the  quasi¬ 
stationary  distribution  is  of  order  O(logA)  and  rA  — >  r*  as  A  — >  oo,  where  r*  is  a  fixed  positive  number. 

4.1.  Two  Useful  Lemmas 

From  now  on,  let  be  a  random  variable  whose  distribution  is  given  by  the  cdf 

P(i?oo  ^  x)  =  lim  Poo {Rrn  <  x)  := 

n— >•  oo 

where  Qst(x')  is  the  stationary  distribution  of  R'n.  Also,  hereafter  assume  that  Qa(x)  (the  quasi-stationary 
distribution)  and  Qst(x)  both  exist;  note  that  this  is  the  case  when  Ai  is  continuous. 

Lemma  4.1.  For  any  r  ^  0,  limnjA->.oo  Poo  (Rrn  ^  >  n)  =  Qst(x)  at  all  continuity  points  of  Qst(x). 

Lemma  4.2.  The  mean  of  the  quasi-stationary  distribution, 

PA=  x  dQA(x), 

Jo 

is  upper-bounded  by  0(log  A),  i.e.,  pa  A  (9 (log  A),  as  A  oc,  where  Oflog  A) /  log  A  is  bounded,  as 
A  — >  oo. 
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4.2.  Average  Run  Length  to  False  Alarm 

We  now  present  asymptotic  approximations  for  the  ARL  to  false  alarm  of  the  SR-?’  procedure,  SrA,  and  for 
that  of  the  SRP  procedure,  SA.  Hereafter,  let  Zi  =  log  A,  denote  the  log-LR  for  the  ?-th  observation  and 
define  Sn  =  Z\  +  •  •  •  +  Zn.  Introduce  a  one-sided  stopping  time  ra  =  inf{?r  ^  1 :  Sn  ^  a},  a  >  0.  Let 
na  =  STa  —  a  be  the  overshoot  (excess  over  the  level  a  at  stopping),  and  let 

C  =  lim  Eo[e_K“]  and  x=  lim  Eo[«a],  (8.13) 

a— >•  oo  a— >oo 

be  the  limiting  exponential  overshoot  and  the  limiting  overshoot,  respectively.  Either  is  a  constant  deter¬ 
mined  by  the  model,  and  can  be  computed  numerically;  in  general,  0  <  ('  <  1  and  k  >  0. 

Theorem  4.1.  Provided  ca  =  o(A),  and  assuming  A  —t  oo,  it  is  true  that  ARL(«S}})  =  (A/£)[  1  +  o(l)] 
uniformly  in  0  a  and  ARL(5^)  =  (A/C)[  1  +  o(l)]. 

For  practical  purposes  we  recommend  to  use 

ARL(cS^)  « A/C  -  r  and  ARL(S^)  «  A/C,  -  ytA.  (8-14) 

4.3.  Average  Delay  to  Detection  and  Asymptotic  Optimality 

We  now  proceed  to  obtaining  asymptotic  approximations  for  ADD„(5'j),  u  f  0,  including  ADD00(5^).  To 
judge  whether  the  SR-r  procedure  with  a  certain  head  start,  r  =  ta,  is  asymptotically  order-3  optimal,  we 
will  also  derive  an  asymptotic  lower  bound  for  infTe/w7)  SADD(T). 

From  now  on,  let  Loo  =  'ffZy=ie~Sj  and  let  /  =  Eo[Zi]  denote  the  Kullback-Leibler  information 
number.  Also,  let  S3n  =  Y27=j 

Lemma  4.3.  Let  Eq  [|  Z)  |2]  <  oo  and  assume  that  Z\  is  non- arithmetic.  Let  0  <  Ny\  <  Abe  such  that 
NA/iA1^6  log  A)  oo  and  Na  =  o(A/  log  A)  as  A  —y  oo  for  some  5  £  (0,1).  Let  r  ^  0.  Then,  as 
A  — >  oo, 


E u[SrA  -  v\SrA  >  V,  R/\  =  -  log  A  +  X  -  log(l  +  Rl) 


-E„ 


log  1  + 


Loo 

1  +  R[ 


SrA  >  ",  Rl 


+  o(l)> 


where  o(l)  -»  0  as  A  — >  oo  uniformly  on  {Na  ^  v  <  oo,  Rrv  <  A/Na,  0  ^  r  <  oo}. 

Let 

POO  POO 

Coo  =  E[log(l  +  R00  +  Loo)]  =  /  /  log(l+x  +  y)dQsT(x)dQ(y), 

Jo  Jo 


(8.15) 


(8.16) 


where  Q(y)  =  P0(Loo  ^  y). 

The  following  theorem  provides  asymptotic  approximations  for  ADD-xdiS'j  and  for  ADDo(<S  ^ ). 
Theorem  4.2.  TfEoflZi)2]  <  oo  and  Z\  is  non- arithmetic,  then  for  any  r  f  0  and  as  A  oo, 

ADDoo(<^)  =  y(logA  +  x-Coo)  +  o(l)  and  ADD0(5^)  =  *(log  A  +  x  -  Coo)  +  o(l). 


Let 


_  Er=oADD,(T)Poo(r>^) 

1  ’  ARL(T) 


(8.17) 


The  following  lemma  proposes  a  lower  bound  for  SADD(T)  in  the  class  A(y),  7  >  1.  This  bound  will 
be  used  to  obtain  an  asymptotic  lower  bound  in  Theorem  4.3  and  to  prove  order-3  asymptotic  optimality  of 
the  detection  procedures  in  Theorem  4.4. 
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Lemma  4.4.  Consider  S  \,  and  let  A  =  Ay  be  chosen  so  that  A R  L  ( S a  )  =  7  is  true.  Then  the  following 
lower  bound  holds: 


inf  SADD(T)  ^  J(SA).  (8.18) 

TeA(7) 

The  following  theorem  provides  the  asymptotic  approximation  for  the  lower  bound  Jiff  a). 

Theorem  4.3.  Let  J(T)  be  defined  as  in  (8.17),  and  C0 0  be  as  in  (8.16).  Tjf  Eo[|Zi|2]  <  00  and  Z\  is 
non-arithmetic,  then 

J(Sa)  =  y(log  A  +  X  -  Coo)  +  0(1),  as  A -too. 

The  following  corollary  is  a  direct  consequence  of  Theorems  4.2  and  4.3. 

Corollary  4.1.  If  Z\  is  non-arithmetic  and  Eo[|Zi|2]  <  00,  then  ADD00(54)  =  STADD(6)i )  +  o(l),  as 
A  —t  00,  and 

STADD(5a)  =  —  (log  A  +  x  —  Coo)  +  o(l),  as  A  — t  00. 

The  following  theorem  establishes  asymptotic  optimality  of  the  SRP  and  SR-r  detection  procedures 
under  moderate  conditions.  Its  proof  is  immediate  from  the  above  results. 

Theorem  4.4.  Let  Eo[|Zi|2]  <  00  and  let  Z\  be  non-arithmetic. 

(i)  Then 

inf  SADD(T)  ^ -[log(7C)  +  x  —  Coo]  +  o(l),  asy^too.  (8.19) 

TeA(7)  I 

(ii)  If  in  the  SRP  procedure  A  =  Ay  =  7  (,  then  ARL(5^)  =  7(1  +  o(l)]  and 

SADD(S^)  =  y[log(7C)  +  x  -  Coo]  +  o(l),  as  7-700.  (8.20) 

Therefore,  the  SRP  procedure  is  asymptotically  order-3  optimal  in  the  class  A(7). 

(7/7)  If  in  the  SR-r  procedure  A  =  Ay  =  yQ,  and  the  initialization  point  r  =  0(7)  is  selected  so  that 
SADD(S^)  =  ADDoo(cS^),  then  ARL(S^)  =  7[1  +  o(l)]  and 

SADD(S^)  =  j[log(7C)  +  x  -  Coo]  +  o(l),  as  y -too.  (8.21) 

Therefore,  the  SR-r  procedure  is  asymptotically  order-3  optimal. 

Feasibility  of  selecting  r7  so  that  SADDfC'J  =  ADDoofC'j  follows  from  numerical  experiments  per¬ 
formed  by  Moustakides,  Polunchenko,  and  Tartakovsky  [1 13]  as  well  as  from  the  example  below. 

Remark  4.1.  The  argument  similar  to  the  proof  of  Theorem  4.2  can  be  used  in  order  to  show  that  for  the 
SR  procedure 

SADD(5n)  =  ADDo(cm)  =  yOog^  +  x  —  Co)  +  o(l),  as  A  — t  00, 

where  Co  =  Eo[log(l  +  Loo)]-  Since  A  =  ( 7  implies  ARL(5n)  =  7[1  +  o(l)],  it  follows  that  with  this 
choice  of  threshold 

SADD(5a)  =  ADD0(5a)  =  j[log(C7)  +  x  -  C0]  +  o(l),  as  7  — t  00.  (8.22) 

Comparing  (8.22)  with  the  lower  bound  (8.19)  shows  that  SADD(5J4)  —  infTe^(7)  SADD(T)  =  0(1), 
as  7  — t  00.  Thus,  the  SR  procedure  is  only  second-order  asymptotically  optimal  and  the  difference  is 
approximately  equal  to  (Coo  —  Co) /I.  This  difference  can  be  quite  large  when  detecting  small  changes  (i.e., 
when  I  is  small). 
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An  interesting  question  is  how  ADD^S^),  ADD00(5^),  and  ADD0O(5<^)  are  related  when  all  three 
procedures  have  the  same  ARL  to  false  alarm  7?  The  answer  is  that  ADD-^TiSi)  is  the  smallest,  as  estab¬ 
lished  by  Theorem  4.5  below.  Note  also  that  by  Theorems  4.2  and  4.4  the  difference  between  ADDoc(5a), 
ADDoo(5^)  and  ADD0C(5^)  vanishes  as  7  — >  00. 

This  result  can  be  proved  thus:  1)  To  show  that  ARL (<S^)  is  increasing  in  A  (the  fact  that  the  ARL  to  false 
alarm  of  the  SR— r  procedure  is  increasing  in  A  for  a  fixed  r  is  obvious);  and  2)  To  show  that  ADDoc(A^  ) 
is  increasing  in  A  (obviously,  the  ADD’s  at  infinity  arc  the  same  for  all  three  procedures,  assuming  the  same 
threshold  for  all  three).  Since  the  SR  rule,  5  4 ,  requires  the  lowest  threshold  to  attain  the  same  ARL  to  false 
alarm,  ADDoo(5a)  is  the  lowest.  Step  1  can  be  performed  for  the  most  general  model,  i.e.,  we  can  prove 
that  ARL(5®)  is  increasing  in  A  in  the  general  case.  However,  while  we  believe  that  ADD-^fiS^ )  is  also 
increasing  in  A  in  the  general  case,  we  are  able  to  prove  this  fact  only  for  the  exponential  family  with  a 
certain  log-concavity  property,  which  guarantees  monotonicity  properties  of  the  Markov  detection  statistics. 

For  //  >  0,  regal'd  the  sequence  defined  by  the  recursion 

R^l !  =  (v  +  R^)  A„+1,  n  >  0,  R™  =  r.  (8.23) 

To  prove  the  required  result  we  need  the  following  lemma. 

Lemma  4.5.  Let  fo{x)  =  expjdx  —  fifi)}  be  a  density  (with  respect  to  some  sigma-finite  measure)  where 
without  loss  of  generality  V’(O)  =  ip'iO)  =  0,  and  suppose  that  the  corresponding  distribution  function 
Fq= q(x)  is  log-concave  (i.e.,  log-Fo(x)  is  a  concave  function).  Suppose  that  g(x)  =  fg(x)  for  some  6  >  0 
and  that  f(x)  =  fg=o(x),  so  that  A =  eexi~'tl’(e)_  Then  the  process  (Mn)n^ 0  that  has  transition  probabili¬ 
ties 

P (Mn+i  ^  x\Mn  =  t)  =  Poo  (r%1  r  ^  x\rW  =  t,  R%1 1  <  a) 

is  a  stochastically  monotone  Markov  process,  i.e.,  P(Mn+i  >  x\Mn  =  t)  is  non-decreasing  and  right- 
continuous  in  tfor  all  x. 

Remark  4.2.  Note  that  the  Gaussian  distribution  is  log-concave.  Note  also  that  the  main  issue  is  log- 
concavity,  but  not  that  g,  f  belong  to  an  exponential  family,  since  “most”  pairs  g.  f  can  be  embedded  into 
an  exponential  family  via 


fe{x) 


{f(x))1  6{g{x))e  def  eK(x)-y,(9)  ,,  x 

jum~em)edt  nxh 


and  that  without  loss  of  generality  one  can  assume  that  the  observations  themselves  have  been  transformed 
into  A i  (the  likelihood  ratios  of  Xt  and  of  A*  are  the  same,  and  by  translation  one  can  obtain  =  no)  = 
0). 

We  can  now  proceed  to  stating  the  desired  result. 

Theorem  4.5.  Assume  the  exponential  family  and  log-concavity  conditions  of  Lemma  4.5.  Let  0  <  7  <  00 
be  fixed,  and  let  AT,  be  such  that  the  ARL  to  false  alarm  of  the  SR-r  procedure  Tf,  =  inf{n  1 :  RTn  f  Ar,  } 
is  7.  Then  ADDoc(TJr )  is  an  increasing  function  of  r  and 

min  ADD^S^)  =  ADD^S^)  <  ADDoc(5?  ), 

where  Aq  is  such  that  ARL(5^  )  =  7. 

Q 
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4.4.  Computing  Constants  Co  and  C oo 

In  order  to  implement  the  asymptotic  approximations  we  have  to  be  able  to  compute  the  constants  Co  and 
Coo.  To  compute  Co  and  Coo  we  need  to  find  qsT  0*0  =  dQ st(x) / d/x  and  qo(x')  =  dQo(x)/dx.  These  pdf-s 
can  be  found  from  the  equations 


qs  t(z) 


JLp* 

dx  w 


dy ;  qo(x) 


_l_ P t 
d®  0 


dy, 


where  P/ (t)  =  P00(Ai  ^  t),  d  =  {0,  oo}.  Both  Co  and  C0 0  can  then  be  found,  e.g.,  numerically.  The  next 
subsection  offers  a  comparative  performance  analysis  for  an  example  where  Co  and  Coo  are  computable 
analytically. 

4.5.  Accuracy  of  Asymptotic  Approximations:  An  Example 

To  test  the  accuracy  of  the  proposed  asymptotic  approximations,  we  carried  out  an  extensive  performance 
evaluation  of  the  procedures  discussed  in  the  earlier  sections  for  the  following  example.  Suppose  {Xn}n,>\ 
is  a  series  of  independent  observations  such  that  Xi,X2,  ■  ■  ■ ,  Xv  are  beta(2, 1)  each,  and  Xu+\,  Xu+2,  ■  ■ . 
arc  beta(l,  2)  each.  Put  another  way,  the  series  undergoes  a  sudden  and  abrupt  shift  in  the  expected  value 
from  2/3  pre-change  to  1/3  post-change,  while  retaining  the  variance.  The  pre-  and  post-change  probability 
densities  for  this  scenario  are  f(x)  =  2xl{0^a.^1i.  and  g(x)  =  2(1  —  x)ll{osS:r<i} 

To  be  specific,  our  goal  is  to  verify  the  conditions  and  the  accuracy  of  the  asymptotic  approximations 
stated  in  Theorem  4.4,  i.e., 


SADD(S^)  fySADD(S^)  «  y(log  A  +  x  -  C^)  and  SADD(5n)  ~  y(log  A  +  x  —  Co),  (8.24) 


and  also  the  approximations  (8.14). 

To  undertake  this  task,  it  is  necessary  to  know  Co,  C0 c,  (,  x,  r  a  and  ha-  It  is  rare  that  Co  and  C,*, 
can  be  found  analytically,  yet  the  beta(2,  l)-to-beta(l,  2)  model  at  hand  they  can:  Co  =  1  and  Coo  = 
7t2/6  «  1.6449.  Also,  note  that  1  =  1.  Thus,  SADD)^)  «  SADD(5^)  ps  logA  +  x  —  1.6449  and 
SADD(5a)  ~  log  A  +  x  —  1. 

Unfortunately,  neither  x  nor  /  arc  computable  exactly.  Monte  Carlo  simulations  with  106  trials  have 
been  performed  to  estimate  the  two  as  x  «  1.255  and  (  ~  0.426  with  the  standard  error  less  than  10-3. 
Specifically,  these  estimates  were  obtained  from  the  formulas 


EofS2] 

2E0[Si] 


-t 

EfcE *Ki  c 


:  exp 


00  1  /  N 

2EA:(Ipo  (Sk^0)  +  ¥oo(Sk>0) 

k=  1  K  ^  " 


where  x~  =  min(0,  x)\  see,  e.g.,  Woodroofe  [185]. 

Though  we  can  find  Co  and  C^,  neither  the  quasi-stationary  distribution,  required  for  the  SRP  proce¬ 
dure,  nor  ADD(/(T)  for  u  G  0  and  the  ARL  to  false  alarm  seem  feasible  to  get  analytically.  To  overcome  this 
difficulty,  these  quantities  were  computed  numerically,  using  the  approach  of  Moustakides,  Polunchenko, 
and  Tartakovsky  [113]  with  the  number  of  breakpoints  set  at  3  x  104,  high  enough  to  ensure  the  relative 
error  in  the  order  of  a  fraction  of  a  percent. 

At  this  point  the  only  unresolved  question  is  that  of  how  to  choose  r.  Several  options  have  been  pro¬ 
posed  in  Moustakides,  Polunchenko,  and  Tartakovsky  [113],  one  of  which  is  to  set  r  =  y, a ■  Recall  that 
Theorem  4.4  requires  (a)  r  =  o(A)  as  A  — »•  oo  and  (b)  SADD(5}()  =  ADDoc(5}1).  With  this  choice,  the 
condition  (a)  is  satisfied  since  according  to  Lemma  4.2  ha  G  0(log  A). 

Condition  (b)  is  also  satisfied  even  for  small  values  of  the  ARL  to  false  alarm.  This  can  be  seen  from 
Figure  8.3,  which  shows  how  ADD„(T)  evolves  as  v  runs  from  0  to  20  for  the  SRP  test  and  for  the  SR-r 
procedure  with  r  =  /i  a-  The  ARL  to  false  alarm  is  about  50  for  both  procedures.  Observe  that  the  SR-r 
rule  attains  supremum  at  ia  -a  oc.  Also,  the  stationary  regime  kicks  in  as  early  as  at  ia  =  6,  and  this  is  for 
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Figure  8.3:  Results  of  numerical  evaluation  of  the  conditional  average  detection  delay  vs.  changepoint  v  of 
the  SR,  SRP  and  SR-r  (r  =  ha)  procedures  for  the  beta(2,  l)-to-beta(l,  2)  model. 


Figure  8.4:  Results  of  numerical  evaluation  of  operating  characteristics  of  the  SR,  SRP  and  SR-r  (r  =  ha) 
procedures  for  the  beta(2,  l)-to-beta(l,  2)  model. 
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ARL(T)  ~  50.  Figure  8.3  provides  an  illustration  of  Theorem  4.5  -  ADD0O(T)  is  indeed  the  smallest  for 
the  SR  procedure,  while  the  difference  is  not  substantial. 

Shown  in  Figure  8.4  arc  the  operating  characteristics  of  the  procedures  of  interest  expressed  via  SADD(T) 
versus  ARL(T),  and  the  lower  bound  J{Sa)  -  all  plotted  against  log[ARL(T)].  The  range  of  values  of 
ARL(T)  is  from  very  low  (as  in  under  10)  to  as  high  as  104.  The  log  scale  is  particularly  convenient  in  this 
case  because  the  Kullback-Leibler  information  number  is  1 ,  and  from  the  asymptotic  expansions  it  follows 
that  SADD(T)  with  respect  to  log[ARL(T)]  should  be  straight  diagonal  lines  with  unit  slope.  Such  an  ex¬ 
pected  form  of  dependence  is  indeed  confirmed  for  ARL(T)  above  roughly  100,  i.e.,  the  point  at  which  the 
asymptotics  kick  in.  When  ARL(T)  <  100,  a  slight  deviation  from  the  liner  curve  is  observed.  It  is  also 
seen  that  the  performance  of  the  SRP  rule  and  that  of  the  SR-?’  procedure  with  r  =  /?  i  hardly  exhibit  any 
difference. 

To  better  illustrate  the  performance  difference.  Table  4.5  provides  a  summary  of  selected  values  SADD(T) 
and  J(Sa)-  Also  presented  in  parentheses  arc  the  corresponding  theoretical  predictions  made  based  on  the 
asymptotic  approximations  (8.24)  and  (9.4). 


Table  8.2:  Operating  characteristics  of  the  SR,  SRP  and  SR-r  procedures  for  the  beta(2,  l)-to-beta(l,  2) 
model.  Numbers  in  parentheses  arc  the  corresponding  theoretical  values  computed  using  the  asymptotic 
approximations. 


Test 

7 

50 

too 

500 

1000 

10000 

SR 

A 

21.0 

42.0 

212.0 

424.5 

4256.0 

ARL(T) 

50.412  (49.342) 

99.832  (98.684) 

499.866  (498.12) 

999.797  (997.415) 

9999.675  (10000.0) 

SADD(T) 

3.407  (3.312) 

4.051  (4.005) 

5.622  (5.615) 

6.309  (6.308) 

8.607  (8.611) 

SRP 

A 

21.5 

43.0 

213.5 

426.5 

4259.0 

ARL(T) 

49.635  (48.48) 

99.664  (98.431) 

499.424  (497.595) 

999.87  (997.404) 

9999.81  (10000.066) 

SADD(T) 

2.942  (2.668) 

3.534  (3.361) 

5.021  (4.97) 

5.692  (5.663) 

7.965  (7.966) 

SR-r 

A 

21.5 

43.0 

213.5 

426.5 

4259.0 

r  =  MA 

2.037 

2.603 

4.052 

4.711 

6.982 

ARL(T) 

49.554  (48.48) 

99.582  (98.431) 

500.52  (497.595) 

999.792  (997.404) 

9999.735  (10000.066) 

SADD(T) 

2.942  (2.668) 

3.534  (3.361) 

5.023  (4.97) 

5.692  (5.663) 

7.965  (7.966) 

Lower  Bound 

2.939  (2.668) 

3.523  (3.361) 

5.017  (4.97) 

5.688  (5.663) 

7.965  (7.966) 
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Chapter  9 

Theoretical  Results  in  Distributed  Quickest 
Changepoint  Detection 


The  work  presented  in  this  chapter  is  the  result  of  collaboration  of  Dr.  Tartakovsky  (USC)  and  Dr.  Veeravalli 
(Illinois). 


1.  The  General  Problem  and  Preliminaries 


While  the  quickest  changepoint  detection  problem  has  been  studied  for  over  fifty  years,  there  has  been 
little  prior  work  on  theoretical  extensions  to  the  distributed  sensor  setting  and  general  stochastic  models 
for  observations.  We  developed  novel  procedures  for  change  detection  and  isolation,  to  investigate  their 
properties  for  general  multipopulation  and  distributed  stochastic  models,  as  well  as  to  provide  an  analytical 
framework  to  predict  their  performance  in  terms  of  the  tradeoff  between  detection  delay  and  frequency  of 
false  alarms. 


To  address  this  goal,  we  have  performed  anal¬ 
ysis  of  several  generalizations  of  the  change  detec¬ 
tion  problem  that  arise  in  the  applications  to  dis¬ 
tributed  sensor  systems.  Specifically,  we  consider 
the  distributed  multisensor  system  with  N  sensors, 
Si, . . . ,  Sjv,  communicating  with  a  fusion  center, 
as  shown  in  Figure  9.1.  At  time  n,  an  observa¬ 
tion  Xi(n)  is  made  at  sensor  St.  The  changes  in 
the  statistical  properties  of  the  sequences  {Xi(n)} 
are  governed  by  an  event.  We  investigate  a  variety 
of  models  for  the  change  process:  only  one  (or  a 
subset)  of  the  sensors  changes,  they  all  change  at 
the  same  time,  or  they  change  at  different  times. 
We  also  include  various  scenarios  for  communica¬ 
tion  with  the  fusion  center,  from  the  centralized  one 
where  the  sensors  send  sufficient  statistics,  to  the 
decentralized  one  where  they  send  quantized  ob¬ 
servations  or  local  decisions.  We  study  the  role  of 
feedback  from  the  fusion  center,  and  investigate  schemes  for  conserving  energy  at  the  sensors  such  as  switch¬ 
ing  the  sensors  between  on/off  modes  and  censoring  their  observations.  This  concert  of  possibilities  leads 
to  a  very  interesting  set  of  open  problems  that  arc  discussed  in  the  following  sections.  In  order  to  address 
the  wide  range  of  potential  applications  of  our  theory,  we  accommodate  general  statistical  models  for  the 
observations  and  allow  for  different  degrees  of  model  uncertainty. 


Figure  9.1:  Change  process  detection  in  sensor  net¬ 
work. 
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2.  A  Distributed  Scenario  with  no  Feedback  and  Local  or  Full  Memory 

In  the  rest  of  this  section,  we  will  be  interested  in  a  particular  distributed  and  decentralized  multisensor 
scenario  where  the  statistical  properties  of  the  sensors’  observations  change  at  the  same  unknown  point  in 
time  A,  and  no  communication  between  sensors  and  no  feedback  between  the  fusion  center  and  sensors 
are  allowed,  as  shown  in  Figure  9.2.  The  goal  is  to  detect  this  change  as  soon  as  possible,  subject  to  false 
alarm  constraints.  Therefore,  there  is  a  distributed  iV-sensor  system  in  which  at  time  n  one  observes  an  N- 
component  vector  stochastic  process  (X\(n), . . . ,  X^(n)).  The  i-th  component  2Q(n),  ri  =  1,2,...  corre¬ 
sponds  to  observations  obtained  from  the  sensor  St.  We  will  consider  two  approaches  to  the  decentralized 
fusion  problem.  In  the  first  case,  the  sensors  quantize  their  observations  and  these  quantized  observations 
are  sent  to  the  fusion  center.  In  this  scenario  there  is  only  local  memory,  since  quantization  is  performed 
based  on  current  observations  (i.e.,  no  past  observations  participate  in  quantization).  On  the  contrary,  in 
the  second  scenario  the  sensors  make  local  decisions  based  on  all  past  observations  (full  memory).  These 
decisions  are  then  sent  to  the  fusion  center  for  making  a  final  decision. 

At  an  unknown  point  in  time  v  (i/  =  1,2...) 
something  happens  and  all  of  the  components 


change  their  distribution.  Conditioned  on  the 
change  point,  the  observation  sequences  {X\ (n)}, 
{X2 (n)},  . . . ,  {2Cv(n)}  are  assumed  to  be  mutu¬ 
ally  independent.  Moreover,  we  assume  that,  in  a 
particular  sensor,  the  observations  arc  iid  before  and 
after  the  change  (with  different  distributions).  If  the 
change  occurs  at  v  =  k,  then  in  sensor  S,  the  data 
Xj(l), . . . ,  Xi(y)  follow  density  fi(x),  while  the 
data  Xj(u  +  1),  Xi(u  +  2), . . .  have  the  common 
distribution  with  density  gi(x). 


T 


From  now  on,  let 


FINAL  DECISION 


(9.1) 


Figure  9.2:  Change  detection  using  distributed  sen-  be  the  log-LR  (LLR)  between  the  “change”  and 


“no-change”  hypotheses  for  the  n-th  observation 
from  the  i-th  sensor,  and  let  I*  =  E|  \Zj(l  )\  be  the 


sors. 


Kullback-Leibler  (K-L)  information  number  between  the  densities  gi(x)  and  fi(x). 

The  asymptotic  performance  of  an  optimal  centralized  detection  procedure  that  has  access  to  all  data 
bX"  is  given  by 


7^-[1  +  °(1)]) 

Cot, 


inf  SADD(T) 

Te  A(7) 


(9.2) 


where  Jtot  =  h-  See,  e.g.,  Basseville  and  Nikiforov  [16],  Siegmund  [154],  Tartakovsky  [162].  This 
performance  is  attained  for  the  centralized  CUSUM  and  SR  tests  that  use  all  available  data. 

3.  Centralized  CUSUM  and  Shiryaev-Roberts  Detection  Procedures  for  Known  Parameter 
Values  and  Their  Asymptotic  Minimaxity  Properties 

Under  the  notation  introduced  above,  the  centralized  CUSUM  (detection)  statistic  is  defined  recursively  as 


(9.3) 


and  the  centralized  CUSUM  (C-CUSUM)  test  is  identified  with  the  stopping  time  T£s(h)  =  inf{n  ^ 
1 :  Wc(n)  >  h  ),  where  h  >  0  is  a  detection  threshold  which  controls  the  FAR. 
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It  follows  from  Basseville  and  Nikiforov  [16],  Lorden  [95],  Siegmund  [154],  Tartakovsky  [162]  that 
ARL(Tccs(h))  ^  eh  and,  hence,  h  =  logy  guarantees  ARL {T£s(h))  ^  7.  Even  though  this  choice  of  the 
threshold  is  usually  conservative,  it  is  useful  as  a  preliminary  estimate.  Substantial  improvements  can  be 
obtained  using  corrected  Brownian  motion  approximations  Siegmund  [154]  and  the  renewal  argument  Tar¬ 
takovsky  [165].  In  particular,  it  follows  from  Tartakovsky  [165]  that,  as  h  00, 

ARL(rccs(h))  = -^-[1  +  0(1)],  (9.4) 

where  riot  =  ,  /]'  and  £  is  a  constant  (depending  on  the  model)  subject  to  the  renewal  theory.  Although 

this  approximation  is  not  especially  accurate  for  high  FAR,  it  is  satisfactory  already  for  moderate  FAR; 
cf.  Poliak  and  Tartakovsky  [126]. 

If  the  threshold  is  selected  using  (9.4),  i.e.,  /t7  =  log (7 Itot(2)’  then  ARL (T£s(h))  ~  7  and,  as  7  — >■  00, 

inf  SADD(T)>  ^[l  +  o(l)], 

7  tot  (9.5) 

SADD(TGcs(/r7))  =  ^(T^)  -  1)  =  yf  +  0(1), 

J-tot 

which  means  that  the  C-CUSUM  test  is  asymptotically  globally  minimax-optimal  in  the  distributed  setting. 
The  centralized  SR  test  (C-SR)  is  given  by  the  stopping  time 

TgR(A)  =  min{n  ^  1:  Rc{n)  >  A}  ,  (9.6) 

where  the  SR  detection  statistic  obeys  the  recursion 


Rc(  0)  =  0,  Rc(n) 


N 


(1  +  Rc{n  -  1))  exp  ^  Zi(r 


.  2=1 


for  n  ^  1. 


(9.7) 


If  A  =  y(j,  then  ARL(Tscr(A))  ~  7  and,  as  7  — )•  00,  the  relations  (9.5)  hold  true  for  the  C-SR  procedure, 
which  means  that  this  test  is  also  asymptotically  optimal  in  the  minimax  sense. 


4.  Analytical  Techniques  for  Non-iid  Cases 

Much  of  the  analysis  of  change  point  detection  procedures  has  been  restricted  to  the  iid  case.  The  optimality 
properties  of  the  CUSUM  and  SR  procedures  depend  crucially  on  the  iid  assumption.  However  recent  work, 
including  research  by  the  Pis,  has  shown  that  more  general  models  for  the  distributions  can  be  handled  in 
the  asymptotic  setting  where  7  — >  00  Lai  [83],  Tartakovsky  [163],  Tartakovsky  and  Veeravalli  [172]. 

In  particular,  it  can  be  shown  that  the  CUSUM  and  SR  procedures  are  (first  order)  asymptotically  opti¬ 
mal.  Furthermore,  these  asymptotic  optimality  results  extend  to  arbitrary  moments  of  the  delay  Tartakovsky 
[163],  Tartakovsky  and  Veeravalli  [172]. 

Let 

pk  ="v  Vwf  9i{Xi(j)\Xj(l), . .  .,Xj(j  -  1))\ 
h  «  EUU,(j)|X,(l),...,A',(j-l)))’ 

where  in  a  general  non-iid  case  the  densities  fn  ( X, )  and  ft  ( X, )  arc  replaced  with  the  corresponding  condi¬ 
tional  densities. 

The  asymptotic  performance  results  of  (9.5)  in  the  i.i.d.  case  rely  on  the  almost  sure  convergence  of 
the  normalized  LLRs  rr  1  'HjZk  '  -Z) (j )  to  the  K-L  information  number  Jtcot  as  n  — >  00.  This  latter 
convergence  is  guaranteed  by  the  Strong  Law  of  Large  Numbers  as  long  as  /t'ot  is  finite.  To  generalize  these 
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results  to  the  non-iid  case,  we  need  to  assume  the  convergence  of  LLRs  £*+fe_1  with  some  positive  and  finite 
number  q: 

n+k~ 1  ^fc-  a-s>  q  for  every  k  <  oo.  (9.8) 

77,  n— >■  oo 

Furthermore,  we  need  the  following  condition  on  the  rate  of  convergence: 

OO 

y,  Pfc  1 1  £n+k- 1  ~n(l\  >  ne}  <  00  f°r  every  e  >  0.  (9.9) 

n= 1 

The  convergence  implied  by  (9.8)  and  (9.9)  is  also  called  complete  convergence,  and  can  be  written  com¬ 
pactly  as 

Yi+k- 1  Pfc -completely 

— — - >  q  for  every  k  <  oo.  (9.10) 

77,  n— 

Note  that  the  quantity  q  plays  the  role  of  the  total  K-L  distance  between  the  “change”  and  “no  change” 
hypotheses  in  the  non-iid  case.  In  particular',  (9.5)  can  be  extended  to  the  non-iid  case  with  Itcot  replaced 
with  q. 

It  can  be  shown  that  the  complete  convergence  assumption  is  not  restrictive  and  usually  holds  in  practice, 
especially  under  Markov  and  hidden  Markov  models.  Complete  convergence  allows  us  to  extend  most  of  the 
first  order  asymptotic  approximations  to  non-iid  cases,  and  is  being  used  as  a  powerful  tool  in  this  project  to 
obtain  results  for  realistic  models  that  arise  in  the  applications  to  distributed  sensor  systems. 


5.  Multichart  Centralized  CUSUM  and  SR  Procedures  for  Unknown  Parameter  Values 


Consider  a  parametric  model  with  pre-change  and  post-change  densities  fj1'  (x)  and  respectively. 

In  many  applications  the  pre-change  parameter  values  pi  can  be  estimated  quite  accurately  in  advance  and, 
therefore,  can  be  assumed  to  be  known.  However,  the  post-change  parameters  are  seldom  known  in  advance, 
and  the  putative  8i  is  merely  a  representation  of  a  meaningful  change. 

If  the  true  post-change  parameter  value  is  not  equal  to  the  putative  value  Bi,  then  the  C-CUSUM  and 
C-SR  detection  procedures  that  are  tuned  to  0/  are  not  optimal  anymore.  For  the  sake  of  simplicity  consider 
a  symmetric  case  where  0i  =  6  for  all  l  =  1 , ,N.  In  asymmetric  case,  the  argument  is  essentially  the 
same  but  the  notation  becomes  cumbersome.  Write 


Wn{6) 


max 

l^k^n 


n  N 


EElog 

j=k  i=i 


frmDY 


n  n  N 

Hum  =  E  E  E 


k= 1 j=k  1=1 


irmj)) 


for  the  CUSUM  and  SR  statistics  tuned  to  the  value  8.  There  are  several  approaches  for  composite  post¬ 
change  hypotheses:  (a)  A  generalized  LR  approach  based  on  the  generalized  CUSUM  statistic  supe  Wn(8) 
Dragalin  [48],  Lorden  [95];  (b)  A  mixture-based  CUSUM  (or  SR)  statistic  f  Wn(9 )  d\\{9)  averaged  over  a 
prior  distribution  \\{9)  Poliak  [124];  and  (c)  Adaptive  CUSUM  and  SR  procedures  where  the  parameter  9  is 
replaced  with  one-stage  delayed  estimators  Dragalin  [47],  Lorden  and  Poliak  [96],  Tartakovsky  [163]. 

All  the  above  methods  have  pros  and  cons.  The  generalized  likelihood  ratio  approach  is  second-order 
optimal  Dragalin  [48]  but  computationally  not  feasible.  The  mixture-based  approach  is  also  second-order 
optimal  Poliak  [124],  but  may  be  difficult  to  implement  since  it  is  not  always  possible  to  find  a  conjugate 
prior  to  avoid  computational  problems.  The  approach  of  Lorden  and  Poliak  [96]  gives  almost  optimal  per¬ 
formance,  but  also  computationally  demanding.  The  adaptive  approaches  of  Dragalin  [47]  and  Tartakovsky 
[163]  are  very  simple  in  implementation  (recursive)  but  are  not  second-order  optimal  -  the  performance 
degrades  dramatically  for  detecting  small  changes. 

For  this  reason,  we  propose  to  attack  this  problem  from  a  different  standpoint  -  using  a  multichart  cen¬ 
tralized  detection  test  which  will  be  referred  to  as  M-C-CUSUM.  Namely,  in  most  applications  it  is  usually 
possible  to  define  an  interval  \9_,  9}  for  the  post-change  parameter  (either  using  some  prior  information  or 
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designing  two  values  0  and  6).  For  example,  the  point  6  can  be  selected  in  such  a  way  that  the  average  detec¬ 
tion  delay  for  the  values  6  ^  6  is  small,  so  that  further  optimization  is  unnecessary.  The  value  of  0,  in  turn, 
can  be  selected  so  that  for  smaller  values  it  would  be  rather  difficult  to  detect  the  change  with  a  reasonable 
detection  delay  (indifference  zone).  Once  the  interval  is  set,  M  ^  2  “reference”  points  are  selected  from 
that  interval  to  run  M  C-CUSUM  tests  in  parallel,  each  tuned  to  the  respective  point. 

To  be  specific,  let  0rn  <E  [0.  6],  m  =  1, . . . ,  M  be  a  set  of  reference  points,  such  that  0m  <  9m+ \ .  The 
M-C-CUSUM  statistic  is  then  defined  as  follows.  Let  f;)TO(ra)  =  log [g®m (Xi(n)) / fj11  (Xi(n))\  be  the  LLR 
tuned  to  0rn.  First,  we  define  the  C-CUSUM  statistics  for  each  of  the  reference  points 


Wm  ( n )  =  max  ■ 


o  ,w: 


N 

i(n)  +  XX" 

1=1 


i(n)  f  >  W£( 0)  =  0 


and  the  coiTesponding  stopping  times  for  each  of  the  latter  statistics  T^(hm)  =  min{n  +  1:  Wffn)  >  hrn } . 
The  M-C-CUSUM  stopping  time  is  rMc(h)  =  min^^M  Tm(hm),  where  h  =  (hi,.. . ,  Hm),  hm  >  0. 

Clearly,  the  solution  is  not  unique,  since  in  general  there  are  M  different  threshold  values  and  only  one 
constraint  ARL(rMC(h))  =  7.  Therefore,  additional  constraints  are  needed.  We  will  use  two  approaches. 
In  the  first  one,  we  use  a  common  threshold  by  setting  hm  =  h  for  all  m  =  1, . .  .  ,  M.  Obviously,  in  this 
case  the  values  of  ARL(t^(/i))  are  different  for  different  m.  In  the  second  one,  we  balance  the  values  of 
ARL (Tm(hm)),  in  which  case  the  thresholds  arc  found  from  the  equations 


ARL(r^(/im))  =  M7,  m  =  1, . . . ,  M. 


(9.11) 


In  fact,  it  is  possible  to  show  that  hrn  can  be  selected  so  that  equations  (9. 1 1)  hold  for  sufficiently  large  7,  and 
this  approximation  is  asymptotically  accurate  as  7  —>  00.  In  this  balanced  case,  the  following  approximate 
equality  for  ARL(rMc(h))  holds 

/  M 

ARL(rMC(h))  «  e-h™Im £n 

\m= 1 


where  Im  =  E|'"  is  the  corresponding  K-L  divergence  and  0  <  Qm  <  1  is  a  computable 

constant  related  to  the  limiting  “exponential  overshoot”  in  the  one-sided  test,  which  is  subject  of  a  renewal- 
theoretic  argument. 

The  following  theorem  establishes  asymptotic  performance  of  the  M-C-CUSUM  detection  procedure  in 
these  two  scenarios.  We  will  need  the  following  additional  notation:  Jm(9)  =  Y^a=i  ^l,m(  1)>  SADD e(Tc(h)) 

supfc  E ek[r  —  k\r  ^  k\,  where  E^  is  expectation  when  the  post-change  parameter  value  is  8. 

Theorem  5.1.  Assume  that  Jm(8 )  is  monotonically  nondecreasing  in  0  for  8  f  9m  and  Jm(9)  <  00  for  all 
9. 

(i)  For  every  hm  >  0, 

ARL(rMC(h))  ^  =^7  ~f~' 

\  _  p  '‘'771 

Ltn=  1  c 

If  in  addition,  the  LLR  is  non-arithmetic,  then,  as  minm  hm  — >  00, 


ARL(rMC(h))  = 


l  +  o(l) 


ImCi 


(ii)  Let  hm  =  hm(v)  be  such  that  lim 
that  J\  (9*  )  =  0.  Then  for  any  9  >  8*  as  u 


v^oo(hm/v)  =  1 ,  m  =  1 ,...  ,M  and  let  9*  £  (— 00,  9)  be  such 
00, 


SADD0(tMc(Ii))  =  -  V  (1  +  o(l)),  (9.12) 

max  Um\y ) 

l^m^M 
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where  Jm{9)  =  Imfor  9  =  9m  . 

(iii)  Let  h\  =  ■  ■  •  =  Iim  =  h.  Ifh  =  h~t  =  log(7  J^m=i  then,  as  7  — >  00, 

ARL(tmc(^7)  =  t[1  +  o(l)], 


and,  for  all  m  =  1 , . . . ,  M, 


inf  SADD0m(r)  ~  SADD0m(rMc(^7))  ~  (9.13) 

TE.L-  7  lm 

(iv)  Ifhm  =  hm{ 7)  =  log(7M/mCm),  then,  as  7  ->•  00, 

ARl_(r^(/im(7)))  =  ¥7(1  +  o(l)],  m  =  1, ...  ,M; 

ARL(rMC(h(7)))  =  7[1  +  o(l)], 

and,  for  all  m  =  1, . . . ,  M,  asymptotic  relations  (9.13)  hold. 

Note  that  Theorem  5.1  (iii)  and  (iv)  imply  that  the  M-C-CUSUM  is  asymptotically  first-order  optimal 
at  the  points  0\,  ...  .  9  m  in  both  considered  scenarios.  Conditions  of  the  theorem  hold  for  the  exponential 
family  of  distributions. 

Note  also  that  similar  results  hold  for  the  M-C-SR  detection  procedure  (where  the  CUSUM  statistics 
Wffn)  arc  replaced  with  the  SR  statistics  R(m  (n)),  i.e.,  it  is  also  asymptotically  minimax  at  the  points 

0i, ... ,  9m- 

In  Section  7,  we  consider  a  decentralized  detection  procedure  that  uses  compressed  data  . . . , 

UN(n))  by  quantizing  the  data  at  sensors,  so  that  the  required  bandwidth  for  communication  with  the  fusion 
center  is  minimal.  In  this  context,  the  use  of  the  multichart  detection  procedures  is  especially  important, 
since  it  allows  us  to  perform  quantization  in  a  (small)  number  of  isolated  points.  It  is  interesting  to  investigate 
how  the  loss  in  information  (caused  by  quantization)  affects  the  efficiency  of  detection  procedures.  This 
problem  will  be  addressed  below  in  detail.  The  advantage  of  these  tests  is  that  they  do  not  require  any 
processing  power  at  the  sensors. 


6.  Decentralized  Detection  Based  on  Local  Decisions  at  Sensors  for  Known  Models 


We  now  consider  three  decentralized  detection  schemes  with  full  memory  that  perform  local  detection  in 
the  sensors  and  then  transmit  these  local  binary  decisions  to  the  fusion  center  for  optimal  combining  and 
final  decision-making.  Obviously,  these  schemes  require  minimum  bandwidth  for  communication  with  the 
fusion  center.  The  abbreviation  LD-CUSUM  will  be  used  for  procedures  that  perform  CUSUM  tests  in 
sensors  and  use  local  decisions. 


6.1.  Asymptotically  Optimal  Decentralized  LD-CUSUM  Test 

Let 

Wfn)  =  max{0,  Wfn  -  1)  +  Zfn)}  ,  W*(  0)  =  0 

be  the  CUSUM  statistic  in  the  i-th  sensor,  where  Zi{n)  =  /  f^\Xi(n))\  is  the  LLR,  and  let 


Ufn) 


1  if  Wt(n)  f  (jjth 

0  otherwise, 


where  c <7  =  f/Itot  =  h/  YliLi  (^  =  Eo[^i(l)])  an(l  h  is  a  positive  threshold. 
The  stopping  time  is  defined  as 


Tid (h)  =  min {n  ^  1:  min  \Wi(n) / lj,]  ^  h}. 


(9.14) 
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In  other  words,  binary  local  decisions  (1  or  0)  arc  transmitted  to  the  fusion  center,  and  the  change  is  declared 
at  the  first  time  when  (y  (n)  =  1  for  all  sensors  i  =  I .....  Ac 

It  follows  from  Mei  [104]  that  if  Eo[|Z*(l)|3]  <  oo,  then  ARL(T[d(/i))  ^  eh  for  every  h  >  0.  Under  an 
additional  Cramer-type  condition,  it  follows  from  Dragalin,  Tartakovsky,  and  Veeravalli  [49]  that 

SADD(7jd(/i))  =  — — h  Cn  \  y - h  c  +  o(l)  as  h  — >  oo,  (9.15) 

-*tot  V  -*tot 

where  c  is  a  computable  constant  that  depends  on  the  model  and 

Cn  =  E  max  {  — ,  (9.16) 

[  Ii  J 

Y\ .... .  Yn  arc  independent  standard  Gaussian  random  variables;  <x,  =  \jYait(Z\  (f ) ) ;  Var*  is  the  operator 
of  variance  under  f\'\ 

Therefore,  if  h  =  logy,  then 

inf  SADD(r)  ~  SADD(Tld(/j))  ~  as  7  ^  oo,  (9.17) 

r&C-y  Itot 

and  the  detection  test  T[d(A)  is  globally  first-order  asymptotically  optimal  (AO).  Correspondingly,  we  will 
use  the  abbreviation  AO-LD-CUSUM  for  this  test  in  the  rest  of  the  report. 

However,  since  the  second  term  in  the  asymptotic  approximation  (9.15)  is  on  the  order  of  the  square 
root  of  the  threshold,  it  is  expected  that  the  convergence  to  the  optimum  is  slow.  Furthermore,  the  perfor¬ 
mance  degradation  compared  to  the  optimal  centralized  test  is  expected  to  be  more  and  more  severe  with 
growth  of  the  number  of  sensors,  since  the  constant  Cn  given  by  (9.16)  increases  with  N.  Note  that  for 
the  optimal  centralized  CUSUM  and  SR  tests  and  for  the  decentralized  CUSUM  and  SR  tests  with  binary 
quantization  introduced  below  in  Subsection  7  residual  terms  are  constants.  We  therefore  expect  that  for 
moderate  false  alarm  rates  typical  for  practical  applications  the  procedures  with  quantization  may  perform 
better.  In  Subsection  9.1,  this  conjecture  is  verified  for  the  Poisson  model. 

It  is  worth  mentioning  that  the  results  similar  to  (9.15)  and  (9.17)  arc  not  available  for  LD-SR  detection 
test  (where  local  voting  is  done  based  on  the  SR  statistics  Ri(n)  in  place  of  the  CUSUM  statistics  Wr{n))  in 
the  class  C7.  It  turns  out  that  the  renewal  property  of  the  CUSUM  statistics  Wt(n)  plays  a  crucial  role  under 
the  ARL  to  false  alarm  constraint  (as  well  as  under  the  local  PFA  constraint).  However,  it  follows  from  the 
work  of  Tartakovsky  and  Veeravalli  [173]  that  the  LD-SR  detection  test  can  be  effectively  constructed  in  a 
Bayesian  setting. 

6.2.  Decentralized  Minimal  and  Maximal  LD-CUSUM  and  LD-SR  Tests 

Let  T^s(h)  =  min{n:  \Vr(n)  A  h}  denote  the  (local)  stopping  time  of  the  CUSUM  test  in  the  7-th  sensor. 
Introduce  the  stopping  times 

Tmin(h)  =  min(T^s, . . . ,  T^s)  and  Tmax(h )  =  max(T(*s, . . . ,  T£s) 

that  will  be  referred  to  as  minimal  LD-CUSUM  (Min-LD-CUSUM)  and  maximal  LD-CUSUM  (Max-LD- 
CUSUM)  tests,  respectively.  Similarly,  we  may  define  Min-LD-SR  and  Max-LD-SR  tests  based  on  the  local 
SR  stopping  times  in  sensors  TgR(h)  =  min{n:  log  H,(n)  A  h}.  Below  we  focus  on  the  CUSUM-based 
tests  keeping  in  mind  that  the  results  hold  for  the  SR-based  tests  as  well. 

Consider  first  the  false  alarm  rate  for  these  detection  tests.  Clearly,  ARL(Tmax)  A  ARL(T{-:S)  for  every 
i  =  1, . . . ,  N.  Since  ARL(T£S)  ^  eh,  it  follows  that  ARL(Tmax)  ^  eh  for  every  h  >  0.  We  now  show  that 
ARLfTmin)  ^  N~1eh  for  every  h  >  0.  Indeed, 

Tm jT1  =  rnin{n  :  max  IF,; (n)  ^  h}  ^  min{n  :  max  Ri(n)  ^  eh}  ^  min{n  :  Gn{ti)  ^  eh/N }  =  rj, 
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where  Gn(ti)  =  N  1  J2iLi  Ri(n)-  Since  Gm(ti)  —  n  is  a  zero-mean  -martingale  and  since  Gn(v )  ^ 
eh/N,  it  follows  from  the  optional  sampling  theorem  that  ARL(Tmjri )  +  ARLfa)  =  Gn(V)  >  eh/N. 

However,  these  inequalities  arc  usually  very  conservative.  For  large  threshold  values,  asymptotically 
sharp  approximations  can  be  derived  as  follows.  It  follows  from  Poliak  and  Tartakovsky  [126],  Tartakovsky 
[165]  that,  as  A  -+  oo,  under  the  no-change  hypothesis,  the  stopping  times  s,  i  =  1, . . . ,  N  are  expo¬ 
nentially  distributed  with  mean  values  eh / ((fh),  where  Q  arc  constants  that  arc  defined  by  (9.24)  replacing 
S ,q(rjh)  by  Ylk= i  Zi(k).  These  constants  can  be  computed  numerically  for  any  particular  model  using  re¬ 
newal  arguments.  Therefore,  for  a  large  threshold,  Tnnn  (/;  )  is  approximately  exponentially  distributed  with 
mean  ARL(Tm;n)  ~  eh /cn,  where  c\-  =  YliLi  vf^  while  the  mean  of  the  stopping  time  Tmax  is 

ARL(Tmax)  ~  eh/c'N  as  A -+ oo, 

where  dN  <  cn  can  be  easily  computed  for  any  N.  In  particular-,  for  N  =  5  and  in  the  symmetric  case, 
4  =  (60/137K2/  ~  0.44C2/  and  c5  =  5 (2I. 

In  order  to  derive  an  asymptotic  approximation  for  SADD(Tmjn),  note  that  Ei  Tm;n()r)  ^  Ei  T^s(h)  for 
all  7  =  1, . . . ,  N  and,  hence, 


Ei  Tn: 


< 


h 


min i<i<N  I, 


(l  +  o(l)), 


as  h  — >  oo, 


since  Ei  [T£s(/i)]  ~  h/Ii. 

To  derive  an  approximation  for  SADD(Tmax),  introduce  the  stopping  time  77(A)  =  min{n  ^  1 :  min  1  W,  (n)  ^ 
h}  and  note  that  77(A)  ^  Tmax(A).  Since  IT7,; (n)  =  Ylk= 1  Zi{k)  ~  mini<fc<n  Zi(k)  and  the  second  term  is 
a  slowly  changing  sequence,  applying  Theorem  2.3  of  Tartakovsky  [164]  yields 

Ei  77(A)  ~  - — ,  as  A  — >  00, 

mm^^jv  h 


which  implies  that 


Ei[T1t 


A 


-(1  +  o(l)),  as  A  — >  00. 


mini^j^Ar  h 

Therefore,  taking  thresholds  A  =  log (7c a-)  in  the  Min-LD-CUSUM  and  A  =  log(7 dN)  in  the  Max-LD- 
CUSUM,  we  obtain  the  bounds  for  tradeoff  curves  that  relate  the  SADD  and  the  ARL,  as  7  — >  00: 


SADD(T11: 


log  7 


maxi<j<iv  1  i 


(1  +  o(l)),  SADD(Tmax)  ^ 


log  7 


mini<j<jv  li 


(l  +  o(l)). 


Thus,  in  the  symmetric  case  where  I,  =  /,  the  asymptotic  relative  efficiency  of  these  detection  tests  com¬ 
pared  to  the  optimal  centralized  test  (defined  as  the  ratio  of  the  limiting  values  of  SADDs,  see  (9.34)  below) 

is  ARE(rmin;  rc)  +  ARE(rmax;  rc)  +  N. 

Note  that  while  based  on  the  first-order  asymptotics  it  may  be  expected  that  in  the  symmetric  case 
the  Max-LD-CUSUM  test  may  perform  as  well  as  the  Min-LD-CUSUM  test,  Monte  Carlo  simulations  in 
Section  9.1  show  that  the  Min-LD-CUSUM  test  performs  better  even  in  the  symmetric  case.  The  same 
conclusion  has  been  reached  by  Moustakides  [112]  based  on  the  analysis  of  a  2-sensor  continuous-time 
Brownian  motion  model. 


7.  Decentralized  Detection  Based  on  Quantization  at  Sensors  for  Known  Models 

Consider  the  scenario  where  based  on  the  information  available  at  sensor  S,  at  time  n  a  message  Ut{n) 
belonging  to  a  finite  alphabet  of  size  M,  (e.g.,  binary)  is  formed  and  sent  to  the  fusion  center  (see  Figure  9.2). 
Write  U(n)  =  (Lg  (n). ....  (Av(7i)  )  for  the  vector  of  A7  messages  at  time  n.  Based  on  the  sequence  of  sensor 
messages,  a  decision  about  the  change  is  made  at  the  fusion  center.  This  test  is  identified  with  a  stopping 
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time  on  {U(n)}n^i  at  which  it  is  declared  that  a  change  has  occurred.  The  goal  is  to  find  tests  based  on 
{U(n)}n^i  that  optimize  the  tradeoff  between  detection  delay  and  false  alarm  rate. 

Various  information  structures  arc  possible  for  the  decentralized  configuration  depending  on  how  feed¬ 
back  and  local  information  is  used  at  the  sensors.  Here  we  consider  the  simplest  information  structure  where 
the  message  Ui(n )  formed  by  sensor  ,S',  at  time  n  is  a  function  of  only  its  current  observation  2Q(n),  i.e., 
Ui(n)  =  ilH,n{Xi{ri)) .  Moreover,  since  for  a  particular  sensor  Si,  the  sequence  {X,(n)}n>,  is  assumed  to 
be  iid,  it  is  natural  to  confine  ourselves  by  stationary  quantizers  for  which  the  quantizing  functions  do 
not  depend  on  n,  i.e.,  'ipi.n  =  'ipt  for  all  n  7  1.  The  quantizing  functions  Inp  =  {'ipt:i  =  1, . . . ,  N},  together 
with  the  fusion  center  stopping  time  r,  form  a  policy  (P  =  (hip,  r). 

Let  Hu  be  the  hypothesis  that  the  change  occurs  at  time  is  E  {1,2, . . .},  and  let  be  the  hypothesis 
that  there  is  no  change.  Also,  let  <p]  and  g\ 1  -  denote  the  probability  mass  function  (pmf)  induced  on  (/, 
when  the  observation  X,  (n)  is  distributed  as  /,  and  gt,  respectively.  Then,  for  fixed  sensor  quantizers,  the 
LLR  between  the  hypotheses  H /,  and  at  the  fusion  center  is  given  by 


n  N 

Zq(k,  n)  =  EE  log 

j=k  i= 1 


g?\um 


(9.18) 


Hereafter  the  superscript  index  q  stands  for  quantized  versions  of  the  corresponding  variables  to  distinguish 
from  the  centralized  case  where  we  used  the  superscript  c.  For  fixed  sensor  quantizers,  the  fusion  center 
faces  a  standard  change  detection  problem  based  on  the  vector  observation  sequence  {U(n)}. 

Here  the  goal  is  to  choose  the  policy  <p  that  minimizes  SADD(</>)  defined  by 


SADD(</>)  =  sup  E„(t  —  is  \  t  ^  is)  (9.19) 

l^<oo 

while  maintaining  the  ARL  to  false  alarm  at  a  level  not  less  than  7  >  1. 

We  can  define  the  CUSUM  and  SR  statistics  by  Wq (n)  =  maxo Zq(u,n)  and  Rq(n)  =  ^”=1  eZ5(l,’n), 
respectively,  which  obey  the  recursions: 

Wq{n)  =  max{0,  Wq(n  -  1)  +  Zq(n,  n )}  ,  Wq( 0)  =  0; 

Rq(n)  =  [1  +  Rq(n  —  1)]  exp{Zq(n,  n)},  Rq{ 0)  =  0. 

Then  the  CUSUM  and  SR  detection  procedures  at  the  fusion  center  Tqs(h)  and  TgR(a)  are,  respectively, 
given  by 

Tqs(h )  =  min{n  ^  1:  Wq(n)  ^  h}  ,  TqR(a )  =  min{n  ^  1:  log Rq(n)  ^  a}  ,  (9.21) 

where  h  and  a  are  positive  thresholds  which  are  selected  so  that  ARL(T<?S)  7  7  and  ARL(7i?I{)  ^  7. 

Let  Iq  =  E 1  \ij[ 1  ^  ( (/,  (1 ) ) / <]) (l ^  ( Ut  ( 1 ) ) ]  denote  the  K-L  information  number  for  quantized  data  in  the 
i-th  sensor  (i.e.,  divergence  between  <g  1 J  and  gf^),  and  let  Iqot  =  be  the  total  K-L  information 

accumulated  from  all  sensors. 

Similar  to  (9.2)  we  obtain  that  detection  procedures  Tqs{h)  and  TqR(h )  given  in  (9.21),  with  h  =  a  = 
log  7,  are  asymptotically  minimax  optimal  as  7  ->  00  among  all  procedures  with  ARL  to  false  alarm  greater 
than  7  (for  fixed  quantizers  rpi).  To  be  specific, 

inf  SADD(r)  ~  SADD(T^s)  ~  SADD(T?R)  ~  as  7  ->  00. 

reC'r  Hot 

This  result  immediately  reveals  how  to  choose  the  sensor  quantizers:  It  is  asymptotically  optimum  (as 
7  00)  for  sensor  Si  at  time  n  to  select  U,  to  maximize  Iq,  the  K-L  information  number.  By  Tsitsiklis 

[176],  an  optimal  tpi  that  maximizes  If  is  a  monotone  likelihood  ratio  quantizer  (MLRQ),  i.e.,  there  exist 
thresholds  a\,  0,2,  ■  ■  ■ ,  aMt- 1  satisfying  —00  <  ai  ^  <22  ^  ^  am,-  1  such  that 

VVopt(Xj)  —  bi  only  if  <  Xj(Xj)  ^  (9.22) 
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where  ZfXi)  =  log \gifXf)  /  fi(Xi)\  is  the  LLR  at  the  observation  X,t  (at  sensor  Si).  Note  that  function 
V’i.opt  is  independent  of  n. 

Thus,  the  asymptotically  optimum  policy  0op,  for  a  decentralized  change  detection  problem  consists  of 
a  stationary  (in  time)  set  of  MLRQs  at  the  sensors  followed  by  CUSUM  or  SR  procedures  based  on  { U (n) } 
at  the  fusion  center  (as  described  in  (9.21)). 

For  each  i,  we  denote  the  corresponding  pmfs  induced  on  Ufri)  by  g- J^3)  and  gf^)pt  (i.e.,  at  the  output  of 
the  stationary  MLRQ  yyopl  that  maximizes  If).  Then  the  effective  total  K-L  information  number  between 
the  change  and  no-change  hypotheses  at  the  fusion  center  is  given  by 

N 

i^,opt  =  E/(41i>0-  (9-23) 

i=  1 

Further,  we  denote  by  Tfs  and  Tfu,  respectively,  the  CUSUM  and  SR  stopping  rules  at  the  fusion  cen¬ 
ter  for  the  case  where  the  sensor  quantizers  arc  chosen  to  be  bip opt  =  {V’q opt}-  Finally,  we  denote  by 
</>opt  =  fU/>0pt-  Tfs)  and  =  (WoPt>  Tfs)  the  coiTesponding  CUSUM  and  SR  policies,  respectively, 
with  optimal  quantization. 

We  also  need  the  following  additional  notation: 

S9(n)  =  EE  log «  <*(fc)).  59  (°) =  %  =  min in ^  1 :  s9(n )  ^  h} ; 

fc=i*=i  9i 

Cq  =  lirn  Ei  exp  {-(Sq(rjh)  -  h)}  ,  (9.24) 

h — S'-oo 

where  (q  can  be  computed  using  renewal-theoretic  arguments. 

7.1.  Optimality  Properties  of  CUSUM  and  SR  Procedures 

The  asymptotic  performance  of  the  asymptotically  optimum  solutions  to  the  decentralized  change  detection 
problem  described  above  is  given  in  the  following  theorem. 

Theorem  7.1.  Suppose  /tot, opt  is  positive  and  finite. 

(i)  Then  h  =  a  =  logy  implies  that  ARL(T<?S)  ^  ARL(TgR)  ^  7. 

(ii)  If,  in  addition,  Zq{  1,1)  is  non-arithmetic,  then 

ARL(Tfs(h))  ~  - ,  ARL(rs«R(a))  ~  ea/C9  ash,a-+  00;  (9.25) 

>>q  1  tot,  opt 

(Hi)  If  a  =  h  =  logy,  then 

inf  SADD(<j!))  ~  SADD(^opt)  ~  SADD(0opt)  ~ -q-^-  as  y  — >•  00.  (9.26) 

Aot.opt 

If  h  =  Ii-/  =  log[Cg/totj0pt7]  and  a  =  a7  =  l°g(C<j7).  then  ARL {T%s(hy))  ~  y  and  ARL(T<?R(a7))  ~  y  as 
y  — >  00  and  asymptotic  relations  (9.26)  hold. 

7.2.  Binary  Quantization 

We  now  continue  by  considering  the  simplest  case  where  U,  (n)  =  'ipfXf  n))  are  the  outputs  of  binary 
quantizers  and  specify  previous  results  for  this  case.  Also,  in  the  rest  of  this  section  we  will  consider  only 
the  CUSUM  detection  procedure  with  understanding  that  analogous  results  hold  for  the  SR  procedure.  It 
follows  from  Theorem  7.1  that  the  optimal  binary  quantizer  is  the  MLRQ  that  is  given  by 


Ui  =  MX) 


1  ifZi(X)  =  \og[gi(X)/fi(X)} 

0  otherwise, 


(9.27) 
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where  U  is  a  threshold  that  maximizes  the  K-L  information  in  the  resulting  Bernoulli  sequence. 

To  be  precise,  for  l  =  0,1,  let  denote  the  probability  induced  on  Uj{n)  when  the  observation  X,. (n) 

is  in  the  pre-  and  post-change  modes.  Let  /?o,i  =  9o\Ui{j)  =  1)  and  /%  =  g[l\Ui(j )  =  1)  denote  the 
corresponding  probabilities  under  the  normal  and  the  anomalous  conditions,  respectively.  The  resulting 
binary  (Bernoulli)  sequences  { Ur (j),  i  =  1, . . . ,  N},  j  ^  1  arc  then  used  to  form  the  binary  CUSUM 
statistic  similar  to  (9.20)  as 


N 

Wb(n)  =  max{0,  Wb{n  -  1)  +  Zb (n))},  Wb{ 0)  =  0,  (9.28) 

i— 1 


—  1™  9i\UiH)) 

9o\Ui{n)) 

binary  sequence,  which  is  given  by 


where  Zb  (n)  =  log  is  the  partial  LLR  between  the  “change”  and  “no-change”  hypotheses  for  the 


Zb(n)  =  aiUi{n )  +  a0,i. 


(9.29) 


Here 


,  m-p0,i)  ,  i  -ft 

a*  =  log  - 77  >  a0 ,i  =  log  ■ 


Wi-ft)’  “u”  T°!  -A),i 

Then  the  CUSUM  detection  procedure  at  the  fusion  center  is  given  by  the  stopping  time 


Tbs(h)  =  min  |n  ^  1 :  Wb(n )  ^  /i  |  , 


(9.30) 


where  h  is  a  positive  threshold  which  is  selected  so  that  ARL (Tbs(h))  ^  7.  In  what  follows  this  detection 
procedure  will  be  referred  to  as  the  binary  quantized  CUSUM  test  (BQ-CUSUM). 

The  BQ-CUSUM  procedure  with  h  =  log  7  is  asymptotically  optimal  as  7  — »•  00  in  the  class  of  tests 
with  binary  quantization  in  the  sense  of  minimizing  the  SADD  in  the  class  A (7).  More  specifically,  the 
tradeoff  curve  for  the  optimal  binary  test  is 


SADD(TpS)  ~  7^oo, 

•^tot 


(9.31) 


where  Ibot  =  max7  is  the  total  maximal  K-L  distance  (optimized  over  the  quantization  thresh¬ 
olds  ti)',  Ib(ti)  =  [/3 i(ti)ai(ti)  +  do, »(£»)]  Is  the  K-L  distance  for  the  binary  sequence  in  the  i-th  sensor  for 
the  quantization  threshold  fj. 

To  optimize  the  performance,  one  should  choose  thresholds  t\ , . . . ,  Uv  so  that  the  K-L  divergence  is 
maximized,  i.e., 


=  argrna xlb(ti),  l  =  1, . . . ,  N,  (9.32) 

U 

in  which  case  the  supremum  average  detection  delay  for  the  optimal  BQ-CUSUM  test  is 

SADD(Tc6s)  =  -i-  +  0(1)  as  7  ->  00,  (9.33) 

UotU  ) 

where  IbJ  t°)  =  ^Zi  maxt,  Ib(ti)  =  ZZi  W).  and  =  (t?, t%). 

7.3.  Relative  Efficiency 

The  asymptotic  relative  efficiency  (ARE)  of  a  detection  procedure  r7  with  respect  to  a  detection  procedure 
r/7,  both  of  which  meet  the  same  lower  bound  7  for  the  ARL,  will  be  defined  as 

ARE(t7;77)=  lim  [  7 1 .  (9.34) 

v  7^oo  SADD(r?7)  v  y 
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Using  (9.2)  and  (9.31),  we  obtain  that  the  ARE  of  the  globally  asymptotically  optimal  test  v  with  respect 
to  the  BQ-CUSUM  test  Thcs  is 


ARE  (i/;7*)=  lim 

7— >00 


infreCy  SADD(r)  = 
SADD  (T&ihy))  hot' 


(9.35) 


Since  Itot  is  always  larger  than  /t,'ol .  the  value  of  ARE  <  1.  However,  our  study  presented  below  shows  that: 
(a)  certain  decentralized  asymptotically  globally  optimal  tests  may  perform  worse  in  practically  interesting 
prelimit  situations  when  the  false  alarm  rate  is  moderately  low  but  not  very  low,  and  (b)  the  centralized 
CUSUM  only  20-30%  better  (in  terms  of  the  ADD).  We  therefore  conclude  that  the  BQ-CUSUM  test  is 
a  good  solution  to  the  decentralized  change  detection  problem  whenever  the  post-change  distribution  is 
completely  specified. 

It  can  be  shown  that  for  the  three  particular  models,  namely  Gaussian  A7(0, 1)  — >  AT (6, 1),  Poisson 
V{1)  V(9),  and  Exponential  Exp(l)  — >  Exp  (A),  the  ARE  is  a  monotone  function  of  9  in  the  interval 

[2/7 r,  1],  and  lim^o  ARE(0)  =  2/ir  for  the  Gaussian  model  and  lim^i  A  RE  (A  )  =  2/n  for  the  other 
two  models.  Also,  lim^gc  ARE(0)  =  1  for  all  three  models.  Therefore,  we  expect  that  in  the  worst  case 
scenario  (for  close  hypotheses)  the  loss  due  to  binary  quantization  is  about  36%,  and  it  is  small  for  far 
hypotheses.  This  is  confirmed  by  simulations. 

For  the  sake  of  simplicity,  consider  a  symmetric  Gaussian  case  where  the  K-L  information  numbers  arc 
identical  in  all  sensors,  1)  pre-change  and  A f(9, 1)  post-change.  Table  9.1  and  Figure  9.3  illustrate 
how  the  ARE  evolves  with  respect  to  the  parameter  9. 

It  is  seen  that  in  the  vicinity  of  9  ps  0  the  ARE  is  close  to  2 /it.  While  the  ARE  does  not  approach  1  too 
fast,  the  real  relative  efficiency  RE  defined  as  the  ratio  of  “real”  ADDs  reaches  1  very  fast:  for  0  A  5  the 
real  relative  efficiency  is  already  1,  so  no  further  improvement  is  possible/necessary.  Specifically,  the  RE  is 
defined  as  RE(7)  =  SADDj)/SADD|j,  where  the  estimates  of 


SADDg  =  max 


-  1 


take  into  account  that  they  cannot  be  smaller  than  0.  More  accurate  results  obtained  by  Monte  Carlo  yield 
similar  conclusions.  See  Section  9. 


8.  A  Decentralized  Approach  for  Composite  Hypotheses:  Unknown  Parameter  Values 

8.1.  Impractical  Approach  -  Worst-case  Optimization 

We  begin  with  considering  the  approach  that  intuitively  seems  appealing  but  turns  out  to  be  almost  com¬ 
pletely  impractical.  Indeed,  the  first  idea  which  deserves  attention  is  to  try  optimizing  in  the  Worst  Case 
Scenario  (i.e.,  to  optimize  in  the  most  unfavorable  conditions  with  respect  to  the  parameter  value): 

inf  AREgdiof#))  =  inf  max  AREg(i)  =  lim  AREg(fo(0))  =  2/tt  k,  0.637. 

6>>0  9>o  t> 0  e-ro 

However,  this  approach  is  impractical,  which  is  immediately  confirmed  by  the  following  computations. 
Possible  but  impractical  solution:  choose  small  9  =  9'  such  that  the  detection  with  a  reasonable  delay  is  not 
possible  for  smaller  values;  find  h(9')  =  t*  that  maximizes  K-L;  and  use  it  in  the  distributed  system.  In 
Table  9.2  we  use  9'  =  0.1.  It  can  be  seen  that  the  ARE  decreases  very  fast  when  9  increases. 

8.2.  Practical  Approach  -  Decentralized  M-BQ-CUSUM  Test 

For  this  reason  we  now  propose  a  different  approach  that  is  based  on  a  multichart  CUSUM  that  uses  multiple 
reference  parameter  values.  The  quantization  thresholds  are  optimized  for  these  reference  points,  where 
component,  partial  BQ-CUSUM  tests  arc  optimal.  As  a  result,  this  approach  provides  a  quite  accurate 
approximation  to  the  entire  ADD  envelop. 
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Table  9.1:  Numerical  Results  for  the  Gaussian  Scenario  (Two  Sensors,  N  =  2) 


9 

0.01 

0.1 

1 

2 

4 

5 

t"(6) 

0.00792 

0.0792 

0.7941 

1.60083 

3.28628 

4.16375 

W) 

0.00005 

0.005 

0.5 

2 

8 

12.5 

ib{t°(9)\e) 

0.000032 

0.003183 

0.318566 

1.27879 

5.234373 

8.332713 

ARE# 

0.636619 

0.636624 

0.637133 

0.639395 

0.654297 

0.666617 

REfl(7  =  103) 

0.636619 

0.636624 

0.637133 

0.639395 

0.757753 

1 

RE0(7  =  104) 

0.636619 

0.636624 

0.637133 

0.639395 

0.654297 

0.904713 

0 


Figure  9.3:  Asymptotic  relative  efficiency. 


Table  9.2:  Numerical  Results  for  the  Gaussian  Scenario  and  with  Worst  Case  Optimization 


9 

0.1 

0.5 

1.0 

1.5 

2.0 

3.0 

5.0 

to(0) 

0.0792 

0.3963 

0.7941 

1.1951 

1.6008 

2.4306 

4.1639 

ARE  g(t*) 

0.637 

0.614 

0.533 

0.423 

0.315 

0.166 

0.061 

AREfl(to(0)) 

0.637 

0.637 

0.637 

0.638 

0.639 

0.645 

0.667 
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As  in  Section  5,  we  define  an  interval  [0. 6}  for  the  post-change  parameter  and  M  7  2  reference  points 
in  this  interval.  The  M-BQ-CUSUM  procedure  consists  of  implementing  M  BQ-CUSUM  tests  in  parallel, 
each  tuned  to  and  optimized  with  respect  to  the  corresponding  reference  point. 

To  be  specific,  let  9rn  €  [9, 0\,  m  =  1,  2, . . . ,  M  be  a  set  of  reference  points.  At  each  of  these  points,  sen¬ 
sors  perform  quantization  of  the  observations  using  LLR-quantizers  (9.27),  i.e.,  for  the  m-th  reference  point 
the  outputs  of  the  quantizers  are  Ui,m(n)  =  where  im,i(n)  =  log[<yfm  (X/ (n))/f[l‘  (X, (n))] 

is  the  LLR  tuned  to  6m.  The  quantization  threshold  t/  „,  is  chosen  so  that  the  K-L  divergence  is  maximized 
for  the  corresponding  point,  i.e.,  for  l  =  1, . , . ,  N  and  m  =  1, . . . ,  M, 


(9.36) 


where  If^  =  E^m  f^(l)  =  /3g(l,  m)ag(l ,  m )  +  aj(l,  m)  is  the  K-L  divergence  for  the  binary  sequence  at 
the  m-th  reference  point  in  the  t-th  sensor,  and  all  the  notation  is  defined  in  Section  7.2.  In  particular,  the 
LLR  ^f2,(n)  for  the  binary  sequence  at  the  m-th  reference  point  is  given  by  (9.29)  with  obvious  inclusion 
of  the  parameter  0rn. 

The  M-BQ-CUSUM  stopping  time  is  then  defined  as  follows.  First,  we  define  the  BQ-CUSUM  statistics 
for  each  of  the  reference  points 


N 


C QW  =  max  {  WmQ(n)  +  2^m(n)  f  ’ 


1=1 


with  VU,®Q(0)  =  0,  and  the  corresponding  stopping  times  for  the  latter  statistics 

TmQ(hm )  =  min  {n  ^  1 :  W^Q(n)  ^  hm}  . 


The  M-BQ-CUSUM  stopping  time  is  the  minimum  of  these  stopping  times: 

Tmbq (h)  =  min  r®Q (hm),  (9.37) 


where  h  =  (hi,  /12,  •  •  • ,  Hm),  hm  >  0. 

Now,  as  it  has  been  outlined  in  Section  7.2,  if  the  true  value  of  the  parameter  9  =  0t,  then  the  asymptot¬ 
ically  minimax-optimal  solution  to  the  changepoint  problem  in  the  class  of  binary  quantizers  is  given  by  the 
LLR-quantizer  with  the  threshold  fjh  (in  the  Lth  sensor),  as  specified  in  (9.36),  followed  by  the  BQ-CUSUM 
stopping  rule  rfQ(hi )  at  the  fusion  center  (see  (9.30)).  Combining  this  with  Theorem  5.1  allows  us  to  con¬ 
clude  that  the  proposed  M-BQ-CUSUM  procedure  is  asymptotically  optimal  in  the  class  of  procedures  with 
binary  quantization  when  the  post-change  parameter  is  equal  to  reference  points  61,62 , ,9m- 

Exact  results  arc  given  in  the  following  theorem  which  follows  from  the  above  argument  and  Theo¬ 
rem  5.1.  We  use  the  same  notation  for  constants  vm  (0  < 
section  they  are  computed  for  the  Bernoulli  sequences. 

Theorem  8.1.  Assume  that  LLR  ( 1 )  is  non-arithmetic. 

( i )  Let  h\  =  ■■■  =  Hm  =  h  and  let 

(  M 

h  =  fi(7)  =  log  I  7  Cm 

\  m= 1 


vm  <  1)  as  above  keeping  in  mind  that  in  this 


Then,  as  7  — »•  00, 


ARL(tMBqO(7)))  =  7  (1  +  o(l)), 
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and,  for  all  m  =  1 , . . . ,  M, 


inf  SADD0rj 

r6bABQ(7) 


(r)  ~  SADD0m(TMBQO(7))) 


log  7 


7-BQ  .  ,1) 

2^1=1  1l,m\Zl,r 


( ii)  Let,  for  m  =  1, . . . ,  M, 


(9.38) 


Then,  as  7  — >  00, 


ARL(TBQ,m(/im(7)))  =  3f7[l  +  o(l)],  m  =  1, ...  ,M; 
ARL(rMBQ(h(7)))  =  7[1  +  o(l)], 


and,  for  all  m  =  1, . . . ,  M,  asymptotic  relations  (9.38)  hold. 

Therefore,  M-BQ-CUSUM  is  asymptotically  optimal  in  the  class  bAnQ(y)  in  the  sense  of  minimizing 
SADDem,  m  =  1 .... ,  AI  at  the  reference  points  in  both  scenarios  (with  and  without  balancing).  Note  that 
the  results  analogous  to  Theorem  5.1  (i)— (ii)  also  hold  in  the  binary  case  considered. 

Note  that  all  the  above  formulas  are  valid  for  the  exponential  family.  The  quantization  thresholds, 
however,  depend  on  the  model.  In  the  symmetric  Gaussian  example  considered  in  Section  9.2,  the  optimal 
thresholds  for  each  sensors  are  the  same.  For  the  m-th  CUSUM  tuned  to  0  =  0rn.  the  threshold  tl/n  is 


)  =  argrnax  <F(t  -  9m )  log 


$(t  -  9m) 

m 


+  (1  -  $(f  -  0m))  log 


1  -  -  6r 
1-^) 


and  the  corresponding  numbers  Jm{9)  arc 


Jm(9)  =  <S>(fm-9)  log 


*(t°m  -  0) 


+  (l  -  1°S 


1  ~  g(C  -  0) 

1  -  HO 


where  <b(x)  is  the  standard  Gaussian  distribution  function. 


9.  Monte  Carlo  Experiments 


9.1.  Monte  Carlo  Experiments  for  Simple  Hypotheses 

In  this  section,  we  present  the  results  of  MC  experiments  for  the  Poisson  example  where  observations  in  the 
z-th  sensor  Xt(n),  n  f  1  follow  the  common  Poisson  distribution  V ( \it )  in  the  pre-change  mode  and  the 
common  Poisson  distribution  V(9i)  after  the  change  occurs,  i.e.,  for  m  =  0, 1,  2, . . .  and  v  =  k. 


P  k(Xi(n)  =  m) 


iLiT  r-m 

m\  c 

mm  p-0i 

ml  L 


for  k  >  n, 
for  k  7  n- 


where  without  loss  of  generality  we  assume  that  9,  >  //,- . 

Write  0,  =  0,/p,.  It  is  easily  seen  that  the  LLR  statistic  in  the  z-th  sensor  has  the  form 


^n(()  —  log(Qj)  pi{Qi  1); 


(9.39) 


and  the  K-L  information  numbers 

Ii  =  OilogQi  -  p,i{Qi  -  1),  i  =  l,...,N.  (9.40) 
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It  follows  from  (9.2),  (9.40)  and  the  above  discussion  that  the  centralized  CUSUM  and  AO-LD-CUSUM 
tests  with  the  thresholds  h  =  log  7  are  first-order  globally  asymptotically  optimal  and 


inf  SADD(T) 

TeCy 


SADD(rc)  ~  SADD(rw) 


log  7 


Eilll^UogQi  -  Hi{Qi  -  1)] 


(9.41) 


This  means  that  the  ARE  of  these  detection  tests  with  respect  to  the  globally  optimal  test  is  equal  to  1. 

In  order  to  evaluate  the  ARE  of  an  optimal  test  v  (e.g.,  the  centralized  CUSUM  test  rc)  with  respect  to 
the  BQ-CUSUM  test  (9.30)  we  use  (9.35),  which  yields 


ARE(z/;  77) 


Ejli  rna x.ti[/3i(ti)ai(t)  +  Q0,i(U)] 
-  m(Qi  -  l)] 


(9.42) 


where  the  probabilities  /%.i(t)  and  d,it)  are  given  by: 


A),t(*i)  = 

k=\U  ] 


jjk  e  Mi 

~~k\ 


Pi(U)  =  X] 

fc=r*»i 


Qk  e~6i 

kl 


The  optimal  values  of  f)  that  maximize  the  K-L  numbers  arc  easily  found  based  on  these  formulas. 
Consider  a  symmetric  case  where  /q  =  10  and  A,  =  12  for  all  i  =  1, . . . ,  N.  Then  /,  =  I  =  0.1879, 
the  optimum  threshold  is  t(-  =  12,  and  the  corresponding  maximum  K-L  distance  for  the  binary  sequence 
/■' ( t(- )  =  Ib  =  0.119.  Therefore,  the  loss  in  efficiency  of  the  BQ-test  compared  to  the  globally  asymptoti¬ 
cally  optimal  detection  procedure  is  ARE(V;  77)  =  0.119/0.1879  =  0.63,  i.e.,  for  the  large  ARL  we  expect 
about  37%  increase  in  the  average  detection  delay  compared  to  the  centralized  CUSUM  (C-CUSUM).  The 
following  MC  simulations  show  that  for  the  practically  interesting  values  of  the  ARL  (up  to  13,  360)  the 
gain  of  the  optimal  C-CUSUM  test  is  even  smaller,  while  the  AO-LD-CUSUM  test  performs  worse  than  the 
BQ-CUSUM  test  due  to  the  reasons  discussed  in  Section  6.1. 

MC  simulations  have  been  performed  for  the  above  symmetric  situation  (i.e.,  //,  =  /t  =  10  and  9i  = 
9  =  12)  with  N  =  5  sensors.  We  used  105  MC  replications  in  the  experiment.  The  operating  characteristics 
of  the  five  detection  tests  (SADD  vs  log(ARL))  are  shown  in  Ligure  9.4.  It  is  seen  that  the  BQ-CUSUM  test 
substantially  outperforms  the  AO-LD-CUSUM  test  for  all  false  alarm  rate  range  used  in  simulations.  This 
result  confirms  our  conjecture.  It  is  also  seen  that  both  Min-LD-CUSUM  and  Max-LD-CUSUM  perform 
worse  than  both  BQ-CUSUM  and  AO-LD-CUSUM  tests. 

Table  9.3  shows  the  relative  efficiency  of  the  BQ-CUSUM  procedure  with  respect  to  four  other  detection 
procedures,  which  is  defined  as  the  ratio  of  average  detection  delays  for  the  same  ARL:  SADD(V/,) /SADDfz/), 
where  u  is  a  corresponding  detection  test,  i.e.,  v  =  rc,  7|,i,  etc.  It  follows  from  the  table  that  for  the  BQ- 
CUSUM  the  increase  in  the  SADD  compared  to  the  globally  optimal  centralized  CUSUM  is  34%  for  high 
false  alarm  rate,  35%  for  moderate  and  low  false  alarm  rate,  and  37%  for  very  low  false  alarm  rate.  Note  that 
the  last  column  presents  the  ARE.  On  the  other  hand,  the  BQ-CUSUM  outperforms  the  AO-LD-CUSUM 
for  all  range  of  tested  ARL  values,  from  33  to  13,360.  The  gain  is  30%  for  high  false  alarm  rate  and  slowly 
reduces  to  18%  for  low  false  alarm  rate. 


9.2.  Monte  Carlo  Experiments  for  Composite  Hypotheses 

By  means  of  Monte  Carlo  simulations,  in  this  section  we  demonstrate  the  capabilities  of  the  multi-chart 
detection  techniques  proposed  in  Section  8.2.  We  are  particularly  interested  in  the  relative  efficiency  (RE) 
of  the  detection  procedures  as  a  function  of  the  parameter  9.  Lor  two  procedures,  r  and  7,  the  relative 
efficiency  of  rj  with  respect  to  r  at  the  point  9  is  defined  as  REg(r,  rf)  =  SADD^rj/SADD^p),  where  it  is 
assumed  that  both  procedures  satisfy  ARL  «  7.  Note  that  lirn^oc  RE^r,  7)  =  AREg(r,  rf). 

We  consider  a  symmetric  three-sensor  scenario  ( N  =  3),  where  for  each  sensor  both  the  pre-  and  post¬ 
change  observations  arc  iid  unit- variance  Gaussian  random  variables  having  expected  values  zero  and  9  >  0, 
respectively.  The  observations  arc  also  assumed  independent  across  the  sensors. 
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Figure  9.4:  Operating  characteristics  of  detection  procedures. 


Table  9.3:  Relative  Efficiency  of  the  Decentralized  BQ-CUSUM  Test 


log(ARL) 

3.5 

4.5 

5.5 

6.5 

7.5 

8.5 

9.5 

oo 

ARL 

33 

90 

245 

665 

1808 

4915 

13360 

oo 

Test 

Relative  Efficiency  of  the  Decentralized  BQ-CUSUM  Test 

C-CUSUM 

1.51 

1.51 

1.51 

1.53 

1.53 

1.53 

1.54 

1.59 

AO-LD-CUSUM 

0.71 

0.73 

0.75 

0.76 

0.78 

0.80 

0.82 

1.59 

Min-LD-CUSUM 

0.62 

0.58 

0.55 

0.54 

0.51 

0.51 

0.51 

0.316 

Max-LD-CUSUM 

0.33 

0.30 

0.27 

0.26 

0.25 

0.24 

0.24 

0.316 
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All  simulations  have  been  performed  for  7  =  104  (relatively  low  FAR)  and  for  9  between  0.05  and  1, 
which  ensures  high  enough  SADD^r)  to  be  able  to  notice  the  difference  in  performance  (e.g.,  SADD^(r)  ~ 
10  for  9  =  1). 

Most  importantly,  note  that  in  order  to  minimize  the  communication  load  between  the  fusion  center 
and  the  sensors,  it  is  desirable  to  use  as  few  reference  points  as  possible.  Moreover,  when  the  number  of 
reference  points  M  increases,  the  ARL  also  increases.  This  requires  higher  threshold  values,  which  in  turn 
leads  to  an  increase  of  the  detection  delay.  With  this  in  mind,  we  considered  the  case  of  M  =  2  and  3. 

We  begin  with  comparing  centralized  CUSUM  procedures,  C-CUSUM  and  M-C-CUSUM.  The  red 
curve  in  Figure  9.5  represents  the  relative  efficiency  of  the  unbalanced  2-C-CUSUM  (with  a  single  threshold 
hm  =  h)  with  reference  points  0.1  and  0.9  with  respect  to  C-CUSUM.  It  is  seen  that  around  the  right 
reference  point  62  =  0.9  RE  is  close  to  1,  while  in  the  vicinity  of  the  left  reference  point  9 \  =  0.1  RE 
is  around  0.6.  This  is  because  the  threshold  is  the  same  for  both  stopping  times  rc.  1  and  rCj 2,  in  which 
case  ARL(tmc)  ~  ARL(t0,2(A))-  As  a  result,  for  most  of  the  values  of  the  parameter  the  behavior  of  2-C- 
CUSUM  is  similar  to  that  of  C-CUSUM  tuned  to  8  =  0.9.  In  order  to  have  more  efficient  detection  for  small 
changes,  one  has  to  use  either  more  reference  points  or  a  symmetric  2-C-CUSUM  (with  different  thresholds) 
balancing  mean  times  as  in  (9.11). 

To  improve  the  performance  for  small  changes,  we  added  an  extra  reference  point  0.2.  The  blue  curve  in 
Figure  9.5  shows  the  RE  of  3-C-CUSUM  with  respect  to  C-CUSUM  with  reference  points  0.1,  0.2  and  0.9. 
Observe  that  the  relative  efficiency  never  drops  below  the  level  of  approximately  80%  for  values  of  9  ^  0.2, 
and  it  is  equal  to  70%  for  9  =  0.1.  For  smaller  values  of  9,  there  is  a  drop  in  efficiency  to  approximately 
50%.  This  is  not  surprising  since  this  procedure  is  not  designed  to  work  with  parameter  values  smaller  than 
0.1.  We  may  conclude  that  adding  the  extra  point  does  help.  3-C-CUSUM  has  a  much  better  performance 
for  small  values  of  the  parameter  compared  to  2-C-CUSUM.  However,  there  is  still  a  disbalance  between 
the  RE  for  small  and  large  shifts. 

Figure  9.6  (red  curve)  illustrates  the  performance  of  the  2-C-CUSUM  in  the  balanced  case  (9.1 1)  where 
every  reference  point  contributes  equally  to  the  performance  of  the  whole  scheme.  Comparing  with  the  red 
curve  in  Figure  9.5,  we  observe  that  for  the  left  reference  point  (0.1)  the  performance  became  much  better 
(88%  vs.  65%),  while  for  the  right  one  (0.9)  slightly  worse  (90%  vs.  99%).  For  9  around  0.4,  there  is  a  dip 
caused  primarily  by  the  fact  that  the  reference  points  are  too  distant  from  each  other. 

To  eliminate  the  drop  in  the  middle  we  used  an  extra  reference  point  0.4.  The  result  is  shown  in  Fig¬ 
ure  9.6,  the  blue  curve.  In  this  case  the  performance  remains  almost  constant  (uniform)  for  the  entire  range 
of  9  (between  85%  and  90%  for  most  parameter  values,  and  over  80%  for  the  entire  range). 

It  is  clear  that  if  one  is  interested  in  relatively  high  efficiency  for  all  parameter  values  (small,  moderate 
and  large  changes),  then  the  proposed  balanced  approach  can  be  recommended  for  implementation.  How¬ 
ever,  if  one  is  interested  in  rapid  detection  of  only  moderate  and  large  changes,  then  the  first  unbalanced 
approach  with  constant  thresholds  can  be  used. 

We  now  proceed  with  the  results  of  the  experimental  study  of  the  binary  quantized  procedures.  We 
present  the  results  only  for  unbalanced  3-BQ-CUSUM  (with  equal  thresholds  h\  =  /r2  =  A3  =  h).  Fig¬ 
ure  9.7  shows  the  relative  efficiency  of  3-BQ-CUSUM  with  respect  to  the  optimal  BQ-CUSUM  (which 
knows  9).  The  behavior  is  similar  to  that  in  the  centralized  case.  As  before,  the  relative  efficiency  stays 
above  the  level  of  approximately  80%  for  values  of  9  ^  0.2,  while  for  small  changes  there  is  a  drop  in 
efficiency  to  approximately  50%. 

Figure  9.8  shows  the  relative  efficiency  of  the  binary  3-BQ-CUSUM  with  respect  to  the  centralized  3- 
C-CUSUM.  RE  remains  at  the  level  of  approximately  70%  for  all  parameter  values.  Therefore,  increase 
in  the  average  detection  delay  of  the  procedure  with  binary  quantization  in  three  points  (with  rather  low 
requirements  to  communication  bandwidth)  is  only  30%  compared  to  the  centralized  scheme  that  requires 
transmission  of  the  original  uncompressed  data. 

Finally,  Table  9.4  summarizes  the  results  of  Monte  Carlo  simulations  for  asymmetric  M-C-CUSUM 
(with  equal  thresholds). 
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Figure  9.5:  Unbalanced  M-C-CUSUM-to-C-CUSUM  relative  efficiency:  7  =  104;  reference  points  0.1,  0.2 
and  0.9. 


Figure  9.6:  Balanced  M-C-CUSUM-to-C-CUSUM  relative  efficiency:  7  =  104;  reference  points  0.1,  0.4 
and  0.9. 
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Figure  9.7:  Unbalanced  3-BQ-CUSUM-to-BQ-CUSUM  relative  efficiency:  7  =  104;  reference  points  0.1, 
0.2  and  0.9. 


Figure  9.8:  Unbalanced  3-BQ-CUSUM-to-3-C-CUSUM  relative  efficiency:  7  =  104;  reference  points  0.1, 
0.2  and  0.9. 
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Table  9.4:  Summary  of  Numerical  Results  for  the  Gaussian  Scenario  ( N  =  3,  M  =  3,  7  =  104) 


e 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

SADD(rc) 

269.31 

89.22 

45.06 

27.32 

18.32 

13.13 

9.83 

7.58 

6.0 

SADD(tbq) 

380.64 

128.2 

66.02 

40.38 

27.32 

19.69 

14.85 

11.66 

9.32 

SADD(tmc) 

394.72 

112.6 

57.75 

35.55 

22.6 

15.18 

10.59 

7.92 

6.12 

SADD(tmbq) 

551.75 

163.35 

83.18 

50.62 

32.42 

22.1 

15.65 

12.06 

9.41 

RE(tmbq;  rBQ) 

0.69 

0.78 

0.79 

0.8 

0.84 

0.89 

0.95 

0.97 

0.99 

RE(tMBq;  tmc) 

0.69 

0.69 

0.69 

0.7 

0.7 

0.69 

0.68 

0.66 

0.65 
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Chapter  10 

Some  Variants  of  the  Quickest  Change 
Detection  Problem  and  Their  Solutions 


This  chapter  is  intended  to  summarize  contributions  of  the  group  of  Dr.  Veeravalli  to  two  variants  of  the 
quickest  change  detection  problem  and  their  solutions. 

1.  Quickest  Change  Detection  of  a  Markov  Process  Across  a  Sensor  Array 

In  the  standard  formulation  of  the  change  detection  problem,  there  is  a  sequence  of  observations  whose 
density  changes  at  some  unknown  point  in  time  and  the  goal  is  to  detect  the  changepoint  as  soon  as  possible. 
However,  in  many  scenarios  such  as  detecting  pollutants  and  biological  warfare  agents,  the  change  process 
is  governed  by  the  movement  of  the  agent  through  the  medium.  Thus,  it  is  more  suitable  to  consider  the  case 
where  the  statistics  of  each  sensor’s  observations  may  change  at  different  points  in  time. 

In  this  work,  we  consider  a  Bayesian  version  of  this  problem  and  assume  that  the  point  of  disruption 
(that  needs  to  be  detected)  is  a  random  variable  with  a  geometric  distribution.  More  general  disruption 
models  can  be  considered,  but  the  case  of  a  geometric  prior  has  an  intuitive  and  appealing  interpretation 
due  to  the  memorylessness  property  of  the  geometric  random  variable.  In  addition,  the  practically  relevant 
rare  disruption  regime  can  be  obtained  by  letting  the  geometric  parameter  go  to  zero.  We  assume  that  the 
L  sensors  arc  placed  in  an  array  or  a  line  and  they  observe  the  change  as  it  propagates  through  them.  The 
progression  of  change  in  only  one  strictly  determined  direction  can  be  thought  as  a  first  approximation  to 
more  realistic  situations.  The  inter-sensor  delay  is  modeled  with  a  Markov  model  and  in  particular,  the 
focus  is  on  the  case  where  the  inter-sensor  delay  is  also  geometric.  This  model  can  be  viewed  as  a  first  order 
approximation  to  more  general  propagation  models,  with  the  zero-th  order  model  being  the  case  where  the 
statistical  properties  of  the  sensors’  observations  change  at  the  same  time. 

We  study  the  centralized  case,  where  the  fusion  center  has  complete  information  about  the  observations 
at  all  the  L  sensors,  the  change  process  statistics,  and  the  pre-  and  the  post-change  densities.  This  is  appli¬ 
cable  in  scenarios  where:  i)  the  fusion  center  is  geographically  collocated  with  the  sensors  so  that  ample 
bandwidth  is  available  for  reliable  communication  between  the  sensors  and  the  fusion  center;  and  ii)  the 
impact  of  the  disruption-causing  agent  on  the  statistical  dynamics  of  the  change  process  and  the  statistical 
nature  of  the  change  so  induced  can  be  modeled  accurately.  Note  that  under  the  centralized  model,  the 
special  case  where  the  change  happens  at  the  same  time  at  all  sensors  corresponds  to  the  standard  (single 
sensor)  quickest  change  detection  problem  Shiryaev  [149]  with  an  L- vector  observation. 

1.1.  Problem  Formulation 

Consider  a  distributed  system  with  an  array  of  L  sensors,  as  in  Figure  10.1,  that  observes  an  /.-dimensional 
discrete-time  stochastic  process  Zf.  =  [Zk ,i,  •  •  •  ,  Z^l],  where  Z^e  is  the  observation  at  the  Z'-th  sensor  at 
the  A-th  time  instant.  A  disruption  in  the  sensing  environment  occurs  at  the  random  time  instant  Ti,  and 
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{Zk,  ll  {Zk,e}  {Zk,L} 


Figure  10.1:  Changepoint  detection  across  a  linear  array  of  sensors. 


hence,  the  density1  of  the  observations  at  each  sensor  undergoes  a  change  from  the  null  density  /o  to  the 
alternate  density  fi. 

Change  Process  Model:  We  consider  a  change  process  where  the  change-point  evolves  across  the  sensor 
array.  In  particular,  the  change-point  as  seen  by  the  /;-th  sensor  is  denoted  as  T>.  We  assume  that  the 
evolution  of  the  change  process  is  Markovian  across  the  sensors.  That  is, 

p{{rh+h+h  =  mi  +m2  +  m3}\{Ti1+i2  =  mi  +  m2},{Vtl  =  mi}) 

=  P({Th+h+e3  =  mi  +  m2  +  m3}|{r^1+£2  =  mi  +  m2}) 

for  all  £i  and  m,  >  0.  i  =  1.  2,  3.  Further  simplification  of  the  analysis  is  possible  under  a  joint -geometric 
model  on  { T  / } .  Under  this  model,  the  change -point  (T) )  evolves  as  a  geometric  random  variable  with 
parameter  p,  and  inter-sensor  change  propagation  is  modeled  as  a  geometric  random  variable  with  parameter 
{pm i,t,  Z  =  2,  •  •  •  ,  L}.  That  is, 

P({Ti  =  m})  =  p(  1  —  p)m  ,  m  >  0  and 

P({Te  =  m  1  +  m2}|{r£_i  =  m2})  =  pm\,t  (1  -  pm i,e)mi,  mi  >  0 

independent  of  m2  >  0  for  all  l  such  that  2  <  i  <  L.  We  will  find  it  convenient2  to  set  po,i  =  P  and 
Pl,l+ 1  =  0  so  that  pm 1  y  is  defined  for  all  £  =  1,  •  •  •  ,  L  +  1. 

While  a  joint-geometric  model  is  consistent  with  the  Markovian  assumption  as  only  the  inter-sensory 
(one-step)  propagation  parameters  are  modeled,  the  change -points  at  the  individual  sensors  themselves  arc 
not  geometric.  The  joint-geometric  model  can  be  viewed  as  a  first  order  approximation  of  more  realistic 
propagation  scenarios.  In  particular,  note  that  p  1  corresponds  to  the  case  where  instantaneous  disruption 
(that  is,  the  event  {Ti  =  0})  has  a  high  probability  of  occurrence.  On  the  other  hand,  p  — >  0  uniformizes 
the  change-point  in  the  sense  that  the  disruption  is  equally  likely  to  happen  at  any  point  in  time.  This 
case  where  the  disruption  is  “rare”  is  of  significant  interest  in  practical  systems  Basseville  and  Nikiforov 
[15],  Poor  and  Hadjiliadis  [130],  Tartakovsky  and  Veeravalli  [170,  171,  174],  Veeravalli  [181].  This  is  also 
the  case  where  we  will  be  able  to  make  insightful  statements  about  the  structure  of  the  optimal  stopping 
rule.  Similarly,  we  can  also  distinguish  between  two  extreme  scenarios  at  sensor  i  depending  on  whether 
pe-14  ->  0  or  pm  1  ,i  — >  1.  The  case  where  pm  1  x  — >  1  corresponds  to  instantaneous  change  propagation 
at  sensor  i  and  {T^  =  F)_ ] }  with  high  probability.  The  case  where  pm\,t  0  corresponds  to  uniformly 

1  We  assume  that  the  pre-change  (/o)  and  the  post-change  (/i)  densities  exist. 

2This  is  also  consistent  with  an  equivalent  (L  +  2)-sensor  system  where  sensor  indices  run  through  {l  =  0,  •  •  •  ,  L  +  1}. 
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likely  propagation  delay.  The  widely-used  assumption  of  instantaneous  change  propagation  across  sensors 
is  equivalent  to  assuming  pe~i,e  =  1  for  all  £  =  2,  •  •  •  ,  L. 

Observation  Model:  To  simplify  the  study,  we  assume  that  the  observations  (at  every  sensor)  arc  indepen¬ 
dent,  conditioned3  on  the  change  hypothesis  corresponding  to  that  sensor,  and  arc  identically  distributed 
pre-  and  post-change,  respectively.  That  is, 

i.i.d.  /0  if  k  <  Te, 
i.i.d.  f\  if  k  >  Te. 

We  will  describe  the  above  assumption  as  that  corresponding  to  an  “i.i.d.  observation  process.”  Let  D{f\ ,  /o) 
denote  the  Kullback-Leibler  divergence  between  f\  and  /q.  That  is, 

D(fi,  /o)  =  J  log  h{x)dx- 

We  also  assume  that  the  measure  described  by  /o  is  absolutely  continuous  with  respect  to  that  described  by 
f\.  That  is,  if  fi(x)  =  0  for  some  x,  then  fo(x)  =  0.  This  condition  ensures  that  E,\jt  =  1. 

Performance  Metrics:  We  consider  a  centralized,  Bayesian  setup  where  a  fusion  center  has  complete  knowl¬ 
edge  of  the  observations  from  all  the  sensors,  Ik  —  {Zi,  •  •  •  ,  Z /,. } ,  in  addition  to  knowledge  of  statistics  of 
the  change  process  (equivalently,  {pi- 1/})  and  statistics4  of  the  observation  process  (equivalently,  /q  and 
/i).  The  fusion  center  decides  whether  a  change  has  happened  or  not  based  on  the  information,  //.,  available 
to  it  at  time  instant  k  (equivalently,  it  provides  a  stopping  rule  or  stopping  time  r). 

The  two  conflicting  performance  measures  for  quickest  change  detection  arc  the  probability  of  false 
alarm,  PFA  =  P({t  <  Ti}),  and  the  average  detection  delay,  ADD  =  E[(t  —  T i )_l_],  where  x+  = 
max(x,  0).  This  conflict  is  captured  by  the  Bayes  risk,  defined  as, 

R(c)  =  PFA  +  cADD 

=  f;[i({r<r1})+c(r-r1)+] 

for  an  appropriate  choice  of  per-unit  delay  cost  c,  where  !({•})  is  the  indicator  function  of  the  event  {•}. 
We  will  be  particularly  interested  in  the  regime  where  c  0.  That  is,  a  regime  where  minimizing  PFA  is 
more  important  than  minimizing  ADD,  or  equivalently,  the  asymptotics  where  PFA  — y  0. 

The  goal  of  the  fusion  center  is  to  determine 

Topt  =  arg  inf  ADD(r) 

T  S  in 

from  the  class  of  change-point  detection  procedures  Aa  =  {r  :  PFA(r)  <  a}  for  which  the  probability  of 
false  alarm  does  not  exceed  a.  In  other  words,  the  fusion  center  needs  to  come  up  with  a  strategy  (a  stopping 
rule  r)  to  minimize  the  Bayes  risk. 

1.2.  Dynamic  Programming  Framework 

It  is  straightforward  to  check  that  Shiryaev  [149,  pp.  151-152]  the  Bayes  risk  can  be  written  as 

~T —  1 

R(c)  =  P({T1  >  t})  +  cE  Y,P{{Fi<k}) 

,k= 0 

Towards  solving  for  the  optimal  stopping  time,  we  restrict  attention  to  a  finite-horizon,  say  the  interval  [0,  T], 
and  proceed  via  a  dynamic  programming  (DP)  argument. 

'More  general  observation  (correlation)  models  are  important  in  practical  settings.  This  will  be  the  subject  of  future  work. 

4We  assume  that  the  fusion  center  has  knowledge  of  /o  and  /i  so  that  it  can  use  this  information  to  declare  that  a  change  has 
happened.  Relaxing  this  assumption  is  important  in  the  context  of  practical  applications  and  is  the  subject  of  cun'ent  work. 
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The  state  of  the  system  at  time  k  is  the  vector  Sk  =  [S^i, . . . ,  Skj]  with  Sk£  denoting  the  state  at 
sensor  I.  The  state  Skj  can  take  the  value  1  (post-change),  0  (pre-change),  or  t  (terminal).  The  system  goes 
to  the  terminal  state  t,  once  a  change-point  decision  r  has  been  declared.  The  state  evolves  as  follows: 

Sk,t  =  f(Sk-i,e,Te,  l{r<fc})> 

where  the  transition  function  is  given  as 

(  0  if  7  >  k,  s  7^  t,a  =  0, 

/(.s,7,a)  =  <  1  if  7  <  k,s  /  t,a  =  0, 

(  t  if  s  =  t  or  a  =  1 

with  S0  =  0.  Since  Sk- i  captures  the  information  contained  in  { T/  <  j}  for  0  <  j  <  k  —  1  and  all  l, 

given  Sk-i,  {TV  <  k}  is  independent  of  {T^  <  j,  j  <  k  —  1}  for  all  i.  Thus,  the  state  evolution  satisfies 

the  Markov  condition  needed  for  dynamic  programming. 

The  state  is  not  observable  directly,  but  only  through  the  observations.  The  observation  equation  can  be 
written  as 

zkJt  =  vffi’eh({sk/  + 1})  +  a({sM  =  t}),e>  1 

where  and  !  arc  the  A:-th  samples  from  independently  generated  infinite  arrays  of  i.i.d.  data  accord¬ 
ing  to  /o  and  f\,  respectively.  When  the  system  is  in  the  terminal  state,  the  observations  do  not  matter  (since 
a  change  decision  has  already  been  made)  and  arc  hence  denoted  by  a  dummy  random  variable,  £.  It  is 
clear  that  the  observation  uncertainty  v])  satisfies  the  necessary  Markov  conditions  for  dynamic 

programming  since  they  arc  i.i.d.  in  time. 

Finally,  the  expected  cost  (Bayes  risk)  can  be  expressed  as  the  expectation  of  an  additive  cost  over  time 
by  defining 

9k(Sk)  =  cl({Skji  =  1}) 

and  a  terminal  cost  II  ( { i  =  0}).  Thus  the  problem  fits  the  standard  dynamic  programming  framework 
with  termination  Bertsekas  [22],  with  the  sufficient  statistic  (belief  state)  being  given  by 

P{{Sk  =  sk}\Ik), 

where  Ik  =  {Zi, . . . ,  Z/c}  for  k  such  that  Sk  /  t,  i.e.,  Skj:  €  {0, 1}  for  each  l.  Note  that  this  sufficient 
statistic  is  described  by  2L  conditional  probabilities,  corresponding  to  the  2L  values  that  sk  can  take.  We  will 
next  see  that  this  sufficient  statistic  can  be  further  reduced5  to  only  L  independent  probability  parameters  in 
the  general  case. 

The  fusion  center  determines  r,  and  hence,  the  minimum  expected  cost-to-go  at  time  k  for  the  above 
DP  problem  can  be  seen  to  be  a  function  of  Ik.  For  a  finite  horizon  T,  the  cost-to-go  function  is  denoted  as 
Jj (Ik)  and  is  of  the  form  (see  Bertsekas  [22,  p.  133],  Veeravalli  [181],  for  examples  of  similar  nature): 

=  P{{Fi  >  T}\It) 

Jk(Ik)  =  min  |i3({r1  >  k}\lk),  cP({ Ti  <  k}\lk ) 

+E  J^+i(Ik+i)\lk  0  <  k  <T 

where  p  is  the  empty  set.  The  first  term  in  the  above  minimization  corresponds  to  the  cost  associated  with 
stopping  at  time  k,  while  the  second  term  corresponds  to  the  cost  associated  with  proceeding  to  time  k  +  1 
without  stopping.  The  minimum  expected  cost  for  the  finite -horizon  optimization  problem  is  (Jo). 

3This  should  not  be  entirely  surprising  as  our  assumption  of  a  line  (or  array)  geometry  imposes  a  “natural”  ordering  on  the 
sensors’  change-points.  They  can  be  arranged  in  non-decreasing  order:  Tt  >  I7_i  for  all  l. 
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Recursion  for  the  Sufficient  Statistics:  We  define  an  (L  +  l)-tuple  of  conditional  probabilities,  {pk,e,  £  = 
1,  •  •  •  ,!/  +  !}: 


Pk,i  ~  p({ fVi  <  k,T(  >  k}\lky 

We  now  show  that  pk  =  \pk\,  •  •  •  ,pk,L+ 1]  can  be  obtained  from  pk_ ,  via  a  recursive  approach.  For  this, 
we  note  that  the  underlying  probability  space  U  in  the  setup  can  be  partitioned  as 

L+l 

Q  =  Tk£  where 
i=\ 

Tk,i  —  {Tr_i  <  k,  >  k}. 

The  event  where  no  sensor  has  observed  the  change  is  denoted  as  2\i.  On  the  other  hand,  Tkj  (for  l  >2) 
corresponds  to  the  event  where  the  maximal  index  of  the  sensor  that  has  observed  the  change  before  time 
instant  k  is  £  —  1. 

Observe  that  pkj  is  the  probability  of  Tkj  conditioned  on  Ik.  To  show  that  pk£  can  be  written  in  terms 
of  Pk-i,  the  observations  Zk  and  the  prior  probabilities,  we  partition  Tkj  further  as 


l 

Tk,e  =  [J  Uk,e,j 

3= 1 

Uk,t,j  =  {Tj_i  <  k  -  l,Tj  =  k,  ■  ■  ■  ,T£_!  =  k, 

r?  >  k  +  l},  I  <  j  <  £■ 

Note  that  Uk  £  j  Hi  T/,._ i  y  =  Uk  £  j.  Using  the  new  partition  {Uk  £j,  j  =  1,  ■  ■  •  ,£}  and  applying  Bayes’  rule 
repeatedly,  it  can  be  checked  that  pk/  can  be  written  as 


YL= i  /  (Zfe|4-i,  Uk,e,m)P(.Uk,e,m\h-i) 

Ej=i  ZL= 1  /(Zfc|4-1, 


A 


A U 

\^L+ 1 
i 


Afj 


(10.1) 


where  /(-|  •)  denotes  the  conditional  probability  density  function  of  Z/,  and  Ah:  denotes  the  numerator  term. 

From  the  i.i.d.  assumption  on  the  statistics  of  the  observations,  the  first  term  within  the  summation  for 
Ah  can  be  written  as: 

t- 1  L 

f  (Zk\ik_i,ukAm)  =  n  Mzkj)HMzkj) 

3= 1  3=t 

l- 1  L 

=  n^-n/o^) 

i= i  i=1 


where  Lk  j  =  '\^zk3\  's  bic  likelihood  ratio  of  the  two  hypotheses  given  that  Zkj  is  observed  at  the  j-th 
sensor  at  the  k- th  instant.  For  the  second  term,  observe  from  the  definitions  that 


P(Uk,e,m\h-l)  =  P(Tk-i,m\Ik-l) 


P{Uk,e,m) 

P{Tk-l,m) 
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Thus,  we  have 


Me  = 


A 


P(Uk,e,m)  \ 

P(Tk-l,m)Pk~1,m  ) 


i- 1  L 

n Lk  ,m  n  fo(^k,m) 

m= 1  m=  1 


\m=  ] 


Wk,£,m  Pk—l,m 


*obs(M) 


(10.2) 


where  the  first  part  is  a  weighted  sum  of  Pk-i,m  with  weights  decided  by  the  prior  probabilities,  and  the 
second  paid  of  the  evolution  equation,  d?0b s(k,£),  can  be  viewed  as  that  part  that  depends  only  on  the 
observation  Zk. 

Many  observations  arc  in  order  at  this  stage: 

•  The  above  expansion  for  Me  can  be  explained  intuitively:  If  the  maximal  sensor  index  observing  the 
change  by  time  k  is  l  —  1,  then  the  maximal  sensor  index  observing  the  change  by  time  k  —  1  should 
be  from  the  set  {0,  •  •  •  ,  i  —  1}. 

•  Using  the  joint-geometric  model  for  {T /},  it  can  be  shown  that  Wk,e,m  is  of  the  form: 


k,£,m 


P(Uk,£,m ) 

P(Pk-l,m) 


£-2 


-  (1  -  Pe-i,e)  ■  n  Pj,j+ 1 

j=m—  1 

=  (1  -  pe-i,e)  ■  Wem 

£-1  l  /  £ 

AJ \  =  1 1  Lk,m  |  .foi^k.m )  '  (1  P£—l,£)  I  ^  ^  Pk—l,m  ' 


(10.3) 


m= 1 


m=  1 


\m=l 


with  the  understanding  that  the  product  term  in  the  definition  of  w^m  is  vacuous  (and  is  to  be  replaced 
by  1)  if  m  =  i.  It  is  important  to  note  that  the  joint-geometric  assumption  renders  the  weights 
(wk,£,m)  associated  with  Pk~i,m.  independent  of  k.  This  will  be  useful  later  in  establishing  convergence 
properties  for  the  DP. 

•  It  is  important  to  note  that  given  a  fixed  value  of  (,  j)f.  (  is  dependent  on  the  entire  vector  Pk-i  and 
not  on  pk-i,£  alone.  Thus,  the  recursion  for  Me  implies  that  pk  forms  the  sufficient  statistic  and  the 
function  Jk  (Ik)  can  be  written  as  a  function  of  only  pk,  say  Jk  (pk)-  The  finite-horizon  DP  equations 
can  then  be  rewritten  as 


Jt(Pt)  —  Pt,  i 

Jk  ( Pk )  =  min  [pp l,  c(l  -  pk, i)  +  Al(pk)} 


with 


Aftpk)±E[jT+1(pk+1)\Ik] 

=  f  l4+i  (pfc+i)/(zfc+1|4)] 


Zfe+i—  z 


dz. 


Note  that  the  previously  established  recursion  for  pk+ 1  implies  that  pk+ 1  =  g(pklZk+i)  for  an 
appropriate  choice  of  g(-,  •)  (the  precise  form  of  g(-,  •)  is  clear  from  equations  (10.1)  and  (10.2)) 
which  ensures  that  the  right-hand  side  is  indeed  a  function  of  pk. 
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•  It  is  easy  to  check  that  the  general  framework  reduces  to  the  special  case  when  all  the  change-points 
coincide  with  Ti.  In  this  case,  as  in  Veeravalli  [181],  we  define  pk  =  P({Ti  <  k}\lk).  Note  that 
only  T),  [  and  Tkj  + 1  are  non-empty  sets  with 

Tk, i  =  {Ti  >  k  +  1},  Tk)L+i  =  {Ti  <  k}, 

Pk,L+ 1  =  Pk,  Pk, l  =  1  ~Pk,  Pk,t  =  0,£  =  2,---  ,  L. 


Furthermore,  the  recursion  for  pk  reduces  to 

M 

Pk  =  - f - 

Uj=i  fo(Zk,j)  (i  -  pk- i)  (i  -  p)  +  jv 

L 

M  =  X\h(zk,j)  ((!  ~Pk-i)p  +  Pk-i) 

3  =  1 

which  coincides  with  Veeravalli  [181,  eqn.  (13)-(15)].  This  case  can  also  be  obtained  from  the  formula 
in  (10.3)  by  setting  pe-i/  =  1  for  all  £  with  2  <  £  <  L. 

1.3.  Structure  of  the  Optimal  Stopping  Rule  (ropt) 

The  goal  of  this  section  is  to  study  the  structure  of  the  optimal  stopping  rule,  Topt.  For  this,  we  follow  the 
same  outline  as  in  [22]  and  study  the  infinite -horizon  version  of  the  DP  problem  by  letting  T  oo. 

Theorem  1.1.  Let  p  =  [p\,  ■  ■  ■  ,pl+ i]  be  an  element  of  the  standard  L-dimensional  simplex  V,  defined  as, 
P  —  {p  '■  J2j=i  Pj  =  1  }■  The  infinite-horizon  cost-to-go  for  the  DP  is  of  the  form 

J(P)=  minjpi,  c(l  -pi)  +  Aj(p)|, 

where  the  function  A  j(p):  i)  is  concave  in  p  over  V;  ii)  is  bounded  as  0  <  .4  j{p)  <  1;  and  Hi)  satisfies 
Aj(p )  =  0  over  the  hyperplane  {p  :  p\  =  0}. 

Proof.  See  Ragha van  and  Veeravalli  [133].  □ 

At  this  stage,  it  is  a  straightforward  consequence  that  the  optimal  stopping  rule  is  of  the  form 

Topt  =  inf  jpfc,i(l  +  c)  -  c  <  Aj(pfc)  j. 


That  is,  a  change  is  declared  when  the  hyperplane  on  the  left  side  is  exceeded  by  Aj(pk)  and  no  change  is 
declared,  otherwise.  We  will  next  see  that  this  test  characterization  reduces  to  a  degenerate  one  as  p  — >  0. 
To  establish  this  degeneracy,  we  define  the  following  one-to-one  and  invertible  transformation: 


Qk,e  = 


Pk,e 
PPk,  i 


*  =  !,-• 


,  L  +  1 


which  is  equivalent  to 


Pk,  t 
Pk,e 


1 


■nL+1 

Jj= 2 
P  Qk,e 


i  +  p££I1ffiu 


and 


£  =  %■ 


,  L  +  1. 
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We  can  write  g0,r  in  terms  of  the  priors  as 


9o,i 

Qo/ 


Pop 

PP0,1 


Po,e 

PPo,i 

nf-2 

I  ljr=0 


1 

_  5 

P 

p{{Ti  =  •  •  •  =  fVi  =  o,  >  o}) 
pP({r i  >  o}) 


Pj,j+ 1  (!  -  pe-i,e) 

p( 1  -  P) 


£  =  2, 


,  L  +  1. 


Note  that  while  pk£  arc  conditional  probabilities  of  certain  events,  and  hence,  lie  in  the  interval  [0, 1],  the 
range  of  is  in  general  [0,  oo). 

It  can  be  checked  that  the  evolution  equation  can  be  rewritten  in  terms  of  qk^  as 


Qk,e  = 


1  -  Pt- i,i 
1  -  p 


i-  i 

•  ]^[  Lk,j  ■ 


3= 1 


(10.4) 


It  is  interesting  to  note  from  (10.4)  that  the  update  for  qkj,  is  a  weighted  sum  of  Qk-i.j-J  =  1,  ■  ■  ■  ,  £  with 
progressively  decreasing  weight  as  j  increases.  Similarly,  we  can  define  ./J  (•)  and  A[(r  )  in  terms  of  bq k. 
Using  the  transformation  {qk.e},  Topt  is  seen  to  have  the  form: 


Topt 


(L+l 

( e=-2 


1  -  Aj(bgk)  \ 
P  (c  +  Aj(bqk))  j 


When  all  F /  coincide,  we  have 

Qk,L+ 1  = 


Pk 


P(!  -  Pk) 

1 


A 

= 


9fc, i  =  — ,  =  0,  £  =  2,  •  •  •  ,  L. 

P 


Further,  it  is  straightforward  to  check  that  the  evolution  in  (10.4)  reduces  to 


Qk,L+l 


rij=l  Lk,j 

1  -  p 


(1  +  9fc-l,L+l)  ? 


(10.5) 


Thus,  the  space  of  sufficient  statistics  and  the  optimal  test  reduce  to  a  one-dimensional  variable  (pk  = 
P({  Ti  <  k}\lk)  or  equivalently,  qk)  and  a  threshold  test  on  pk  (or  equivalently,  on  qk),  respectively.  In 
the  general  case,  unless  something  more  is  known  about  the  structure  of  Aj{-)  (which  is  possible  if  there  is 
some  structure  on  {pt- i,r}),  we  cannot  say  more  about  Topt.  Nevertheless,  the  following  theorem  establishes 
its  structure  in  the  practical  setting  of  a  rare  disruption  regime  (p  — >  0).  The  limiting  test  thresholds  (from 
below)  the  a  posteriori  probability  that  no-change  has  happened,  and  is  denoted  as  ua- 


Theorem  1.2.  The  test  structure  corresponding  to  ropt  converges  in  probability  to  a  simple  threshold  oper¬ 
ation  in  the  asymptotic  limit  as  p  ^  0.  This  limiting  test  is  of  the  form: 


) 

f  Stop 

if  log  I 

(  v-^-L+l 

1U{= 2 

Qk,e) 

>  A 

U4  =  < 

Continue 

if  log  1 

(  Y^^+l 
y2-jt= 2 

Qk,e) 

<  A 

for  an  appropriate  choice  of  threshold  A. 
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Proof.  See  Raghavan  and  Veeravalli  [133]. 


□ 


The  test  z/4  is  of  low-complexity  because  of  the  following  properties:  i)  a  simple  recursion  formula  (10.4) 
for  the  sufficient  statistics;  ii)  a  threshold  operation  for  stopping;  and  iii)  the  threshold  value  that  can  be  pre¬ 
computed  given  the  PFA  constraint  (see  Prop.  1.3). 

The  fact  that  ropt  z/4  for  an  appropriate  choice  of  A  does  not  imply  that  z/4  is  asymptotically  (as 
PFA  — >  0  or  as  p  — >  0)  optimal.  However,  the  low-complexity  of  this  test,  in  addition  to  Theorem  1.2,  and 
the  fact  that  the  structure  of  Aj(bqf)  (and  hence,  ropt)  are  not  known  suggest  that  it  is  a  good  candidate 
test  for  change  detection  across  a  sensor  array.  In  fact,  we  will  see  this  to  be  the  case  when  we  establish 
sufficient  conditions  under  which  z/4  is  asymptotically  optimal. 


1.4.  Main  Results  on  7/4 

Towards  this  end,  our  main  interest  is  in  understanding  the  performance  (ADD  vs.  PFA)  of  z/4  for  any 
general  choice  of  threshold  A. 

Special  Cases  of  Change  Parameters:  To  build  intuition,  we  start  by  considering  some  special  scenarios  of 
change  propagation  modeling.  The  first  scenario  corresponds  to  the  case  where  one  (or  more)  of  the  p£-xy 
is  1.  The  following  proposition  addresses  this  setting. 


Proposition  1.1.  Consider  an  L-sensor  system  described  in  Sec.  1.1,  parameterized  by  {pp-\,e},  where 

Pl'  |_i  =  1  for  some  P  and  max  pj  1  <  1.  This  system  is  equivalent  to  an  (L  —  1  )-sensor  system, 

jAP 

parameterized  by  {<5^+i},  where 


$j,j+ 1  —  Pj,j+ 1)  i  —  £  1 

^ZJ+l  =  Pj+t,j+2i  j  P  £ 


with  the  (P  +  1  )-th  sensor  observing  (a  combination  of)  Z^y+i  and  Zky+ 2  with  a  geometric  delay  param¬ 
eter  of  1  =  pp. 

Proof.  The  proof  is  straightforward  by  studying  the  evolution  of  {qu,e}  for  the  original  L-sensor  system. 
From  (10.4),  it  can  be  seen  that  qk/’+i  =  0  (identically)  for  all  k  and  the  reduced  (L  —  1) -dimensional 
system  discards  this  redundant  information,  while  the  observation  corresponding  to  the  (P  +  l)-th  sensor  is 
carried  over  to  the  (ft!  +  2)-th  original  sensor.  □ 


The  second  scenario  corresponds  to  the  case  where  one  (or  more)  of  the  pe-iy  is  0. 

Proposition  1.2.  Consider  an  L-sensor  system,  parameterized  by  { p/>- 1  y},  with  P  indicating  the  smallest 
index  such  that  pp  1  =  0.  This  system  is  equivalent  to  an  P  -sensor  system  with  the  same  parameters  as 
that  of  the  original  system.  It  is  as  if  sensors  (P  +  1)  and  beyond  do  not  exist  (or  contribute )  in  the  context 
of  change  detection. 

Proof.  The  proof  is  again  straightforward  by  considering  the  evolution  of  {(Jkj}  in  (10.4)  and  noting  that 
qk  v  j  >  P  +  2  are  identically  0  for  all  k.  □ 

It  is  useful  to  interpret  Props.  1.1  and  1.2  via  an  “information  flow”  paradigm.  If  change  propagation  is 
instantaneous  across  a  sensor  (corresponding  to  the  first  case),  it  is  as  if  the  fusion  center  is  oblivious  to  the 
presence  of  that  sensor  conditioned  upon  the  previous  sensors’  observations.  In  this  setting,  the  detection 
delay  corresponding  to  that  sensor  is  zero,  as  would  be  expected  from  the  fact  that  the  geometric  parameter 
is  1.  In  the  second  case,  information  flow  to  the  fusion  center  (concerning  change)  is  cut-off  or  blocked  past 
the  first  sensor  with  a  geometric  parameter  of  0.  That  is,  the  observations  made  by  sensors  {P  +  1,  •  •  •  ,  L} 
(if  any)  do  not  contribute  information  to  the  fusion  center  in  helping  it  decide  whether  the  disruption  has 
happened  or  not.  Apart  from  these  extreme  cases  of  oblivious/blocking  sensors,  we  can  assume  without 
loss  in  generality  that 

0  <  min pi-iy  <  max pt-iy  <  1. 
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Continuity  arguments  suggest  that  if  some  pi-\y  is  small  (but  non-zero),  it  should  be  natural  to  expect  that 
the  t- th  sensor  and  beyond  may  not  “effectively”  contribute  any  information  to  the  fusion  center.  We  will 
interpret  this  observation  after  establishing  performance  bounds  for  is  a  • 

Probability  of  False  Alarm:  We  first  show  that  letting  /I  — >  oo  in  is  a  corresponds  to  considering  the  regime 
where  PFA  — )•  0. 

Proposition  1.3.  The  probability  of  false  alarm  with  is  a  can  be  upper  bounded  as 


1  +  p  •  exp  (A) ' 

That  is,  if  or  <  1  and  the  threshold  A  is  set  as  A  =  log  (— \  then  PFA  <  a. 


Proof.  See  Raghavan  and  Veeravalli  [133].  □ 

Universal  Lower  Bound  on  ADD:  We  now  establish  a  lower  bound  on  ADD  for  the  class  of  stopping  times 
Aa.  That  is,  any  stopping  time  r  should  have  an  ADD  larger  than  the  lower  bound  if  PFA  is  to  be  smaller 
than  a. 


Proposition  1.4.  Consider  the  class  of  stopping  times  AQ  =  {r  :  PFA(r)  <  a}.  Under  the  assumption 
that  min  pc_\  o  >  0,  we  have 

e=-2  ,-,l 


inf  ADD(r)  > 

T  £  A  a: 


|log(a)|  •  (l  +  o(l)) 
LD{fi,f0)  +  |  log(l  -  p) | 


as  a  — >  0 


where  the  o(l)  term  converges  to  zero  (wa^O. 


Proof.  See  Raghavan  and  Veeravalli  [133]. 


□ 


Upper  Bound  on  ADD  of  is  a:  We  will  now  establish  an  upper  bound  on  ADD  of  is  a  ■ 

Theorem  1.3.  Let  {pe- 1/}  be  such  that  0  <  rnin pe-i/  <  max pe~iy  <  1.  Further,  assume  that  D(fi,  /o) 
be  such  that  there  exists  some  j  satisfying  £  <  j  <  L  and 


D(fi,fo)  > 


j-e  + 1 


\  i  Pj,j+ 1  J 


(10.6) 


for  all  2  <  i  <  L.  Then,  the  performance  of  is  a  with  A 


is  given  by 


pr  n  <  |lQg(p«)|  •  (1  +  0(1)) 

[  LD(h,fQ)  +  \\og{l-p) 


as  a  — >  0. 


Corollary  1.1.  Combining  Prop.  1.4  and  Theorem  1.3,  it  can  be  seen  that  is  a  is  asymptotically  optimal  (as 
a  S  0)  for  any  fixed  p  >  0.  In  other  words, 


inf  ADD(r)  ~  E  [is a] 

re  Aa 


where  we  have  used  the  notation  Xa  ~  Ya  as  a  0  to  mean  lim  =  1. 

a-r0  la 
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The  proof  of  Theorem  1.3  in  the  general  case  of  an  arbitrary  number  (L)  of  sensors  with  an  arbitrary 
choice  of  {pe- 1/}  results  in  cumbersome  analysis.  Hence,  it  is  worthwhile  to  consider  the  special  case  of 
two  sensors  that  can  be  captured  by  just  two  change  parameters:  p  and  The  main  idea  that  is  necessary 
in  tackling  the  general  case  is  easily  exposed  in  the  L  =  2  setting  in  the  following  result.  The  general  case 
is  carefully  studied  in  Raghavan  and  Veeravalli  [133].. 

Proposition  1.5.  ( L  =  2)  The  stopping  time  ua  is  such  that  u,\  —r  oc  as  ,4  — >  oc.  Further,  if  D(  f\ ,  f0) 
satisfies 


D(fi,  fo)  >  log  (2  -  p  -  pi,2) , 


we  also  have 


lim 

A— too 


E  M 
A 


1 

-  2D(fi,  fo)  +  |  log  (1  —  p) 


1.5.  Discussion  and  Numerical  Results 

Discussion:  A  loose  sufficient  condition  for  all  the  L  sensors  to  contribute  to  the  slope  of  ADD  of  u,\  is  that 


D(fi,  fo)  >  max  min  - 
v  ;  e=i,-,L- 1  j>e+ 1  j  - 


A-iogf  ^-o(1~^1>) 
^  \  1  Pj,j+ 1  J 


=  7w 


Another  sufficient  condition  is  that 


D(fu  fo)  >  £=m£ix_i  •  log  ^1  -  p  +  -  Pjj+i)  j 


That  is,  if  p  is  such  that 


P  >  ~  w-m)> 


1=2 


then  7„  <  0  and  the  condition  of  Theorem  1.3  reduces  to  a  mild  one  that  the  K-L  divergence  between 
/i  and  /o  be  positive.  A  special  setting  where  the  above  condition  is  true  (irrespective  of  the  rarity  of 
the  disruption-point)  is  the  regime  where  change  propagates  across  the  sensor  array  “quickly.”  The  case 
instantaneous  propagation  is  an  extreme  example  of  this  regime  and  Theorem  1.3  recaptures  this  extreme 
case. 

In  more  general  regimes  where  change  propagates  across  the  sensor  array  “slowly”,  either  the  disruption- 
point  should  become  less  rare  (independent  of  the  choice  of  f  \  and  /o)  or  that  the  densities  f  \  and  /o 
be  sufficiently  discernible  (independent  of  the  rarity  of  the  disruption-point)  so  that  all  the  L  sensors  can 
contribute  to  the  asymptotic  slope.  When  these  conditions  fail  to  hold,  it  is  not  clear  whether  the  theorems 
are  applicable,  or  even  if  all  the  L  sensors  contribute  to  the  slope  of  E\vjf\.  Nevertheless,  it  is  reasonable  to 
conjecture  that  as  long  as  min  pt-\  i  >  0,  then  all  the  L  sensors  contribute  to  the  asymptotic  slope. 

However,  the  difference  between  the  asymptotic  and  the  non-asymptotic  regimes  needs  a  careful  revisit. 
Following  the  initial  remark  (Prop.  1.2)  on  the  extreme  case  of  blocking  sensors  (where  some  pi-\£  =  0), 
in  the  more  realistic  case  where  some  pz-\/  may  be  small  (but  non-zero),  it  is  possible  that  if  D(fi,  fo) 
is  smaller  than  some  threshold  value  (determined  by  the  change  propagation  parameters),  not  all  of  the  L 
sensors  may  “effectively”  contribute  to  the  slope  of  ADD,  at  least  for  reasonably  small,  but  non-asymptotic 
values  of  PFA.  For  example,  see  the  ensuing  discussion  where  numerical  results  illustrate  this  behavior  at 
PFA  values  of  10-4  to  10-5  for  some  choice  of  change  propagation  parameters,  even  when  the  condition  in 
Theorem  1.3  is  met.  When  the  condition  in  Theorem  1.3  is  not  met,  such  a  behavior  is  expected  to  be  more 
typical. 
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Numerical  Study  1  -  Performance  Improvement  with  u,\:  Given  that  the  structure  of  ropt  is  not  known 
in  closed-form,  we  now  present  numerical  studies  to  show  that  ua  results  in  substantial  improvement  in 
performance  over  both  a  single  sensor  test  (which  uses  the  observations  only  from  the  first  sensor  and 
ignores  the  other  sensor  observations)  and  a  test  that  uses  the  observations  from  all  the  sensors  but  under 
a  mismatched  model  (where  the  change-point  for  all  the  sensors  is  assumed  to  be  the  same),  even  under 
realistic  modeling  assumptions. 


p  =  0.001,  p1  2  =  0.1,  L  =  2 


Figure  10.2:  Probability  of  false  alarm  vs.  Average  detection  delay  for  a  L  =  2  setting  with  p  =  0.001  and 
Pi, 2  =  0.1. 

The  first  example  corresponds  to  a  two  sensor  system  where  the  occurrence  of  change  is  modeled  as  a 
geometric  random  variable  with  parameter  p  =  0.001.  Change  propagates  from  the  first  sensor  to  the  second 
with  the  geometric  parameter  p\p  =  0.1.  The  pre-  and  post-change  densities  are  CAA(0, 1)  and  CN(  1 , 1), 
respectively  so  that  D(f\,  /o)  =  0.50.  While  the  threshold  for  v,\  is  set  as  in  Prop.  1.3,  the  thresholds  for 
the  single  sensor  and  mismatched  tests  are  set  as  in  [171].  The  recursion  for  the  sufficient  statistic  of  the 
mismatched  test  follows  the  description  in  Veeravalli  [181],  Figure  10.2  depicts  the  performance  of  the  three 
tests  obtained  via  Monte  Carlo  methods  and  shows  that  va  can  result  in  an  improvement  of  at  least  4  units 
of  delay  at  even  marginally  large  PFA  values  on  the  order  of  10-3. 


L  =  5  sensor  case 


Figure  10.3:  Probability  of  false  alarm  vs.  Average  detection  delay  for  a  typical  L  =  5  setting. 

The  second  example  corresponds  to  a  five  sensor  system  where  p  =  0.005.  Change  propagates  across 
the  array  according  to  the  following  model:  pip  =  0. 1 .  py.3  =  0.2,  ^4  =  0.5  and  P4  5  =  0.7.  The  pre- 
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and  the  post-change  densities  are  CM(0, 1)  and  CM  (0.75, 1)  so  that  D(f\,  /q)  ~  0.2813.  With  D(f\ ,  /o) 
and  the  change  parameters  as  above,  Theorem  1.3  assures  us  that  at  least  L  =  2  sensors  contribute  to  the 
ADD  vs.  PFA  slope  asymptotically.  On  the  other  hand,  Figure  10.3  shows  that  more  than  two  sensors 
indeed  contribute  to  the  slope.  Thus,  it  can  be  seen  that  Theorem  1.3  provides  only  a  sufficient  condition  on 
performance  bounds.  It  is  also  worth  noting  the  transition  in  slope  (unlike  the  case  in  Veeravalli  [181])  for 
both  the  mismatched  test  and  v,\  as  PFA  decreases  from  moderately  large  values  to  zero,  whereas  the  slope 
of  the  single  sensor  test  (as  expected)  remains  constant. 

Numerical  Study  II  -  Performance  Gap  Between  the  Tests:  We  now  present  a  second  case-study  with 
the  main  goal  being  the  understanding  of  the  relative  performance  of  va  with  respect  to  the  single  sensor 
and  the  mismatched  tests.  We  again  consider  a  L  =  2  sensor  system  and  we  vary  the  change  process 
parameters,  p  and  pip,  in  this  study.  The  pre-  and  the  post-change  densities  are  CM(0, 1)  and  CM(1.2, 1) 
so  that  D(f\1  /0)  =  0.72. 


p  =  °.2,  p12  =  0.25,L  = 


P  =  0.1 ,  p,  2  =  0.25,  L  =  2 


p  =  0.001,  p1  2  =  0.25,  L  =  2 


p  =  0.0001,  p1  2  =  0.25,  L  =  2 


(d) 

Figure  10.4:  Probability  of  false  alarm  vs.  Average  detection  delay  for  a  L  =  2  setting  with  different  model 
parameters. 

Figure  10.4  and  Figure  10.5(b)  show  the  performance  of  the  three  tests  with  varying  p  parameters  for 
a  fixed  choice  of  p\p.  We  observe  that  the  gap  in  performance  between  the  single  sensor  test  and  v,\  in¬ 
creases  as  p  decreases,  whereas  the  gap  between  va  and  the  mismatched  test  stays  fairly  constant.  Similarly, 
Figure  10.5  shows  the  performance  of  the  three  tests  with  varying  p\p  parameters  for  a  fixed  choice  of  p. 
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We  observe  from  these  plots  that  the  gap  between  the  mismatched  test  and  744  increases  as  [)]■>  decreases, 
whereas  the  gap  between  the  single  sensor  test  and  144  increases  as  f>\  -2  increases. 

The  choice  of  D(fi,  /o)  =  0.72  is  such  that  the  sufficient  condition  in  Theorem  1.3  are  satisfied,  inde¬ 
pendent  of  the  change  parameters.  Hence,  we  expect  the  slope  of  the  ADD  vs.  PFA  plot  to  be  of  the  form 
2-P(/i  /»)+l  iog(i— p) |  asy,nPto^ca^y  as  PFA  — >•  0.  Nevertheless,  Figure  10.5(c)  and  (d)  show  that,  when  both 
p  and  f>]  9  are  small,  the  slope  of  Z44  is  only  as  good  as  (or  slightly  better  than)  the  single  sensor  test,  which  is 
known  to  have  a  slope  of  the  form  d(/i  f0)+j  iog(i-p)|  •  Thus,  we  see  that  even  though  our  theory  guarantees 
that  both  the  sensors’  observations  contribute  in  the  eventual  performance  of  7/4  asymptotically,  we  may  not 
see  this  behavior  for  reasonable  choices  of  PFA  such  as  10-4.  The  case  of  observation  models  not  meeting 
the  conditions  of  Theorem  1.3  is  expected  to  show  this  trend  for  even  lower  PFA  values. 

To  summarize  these  observations,  if  ADD,M,  ADDmm  and  ADD$s  denote  the  average  detection  delays 
for  a  a,  mismatched  and  single  sensor  tests  (respectively)  for  some  fixed  choice  of  PFA,  then 

ADDmm  —  ADD^  oc  -  and  independent  of  p 

Pi, 2 

ADDss-ADD^  oc 

It  is  interesting  to  note  from  the  above  equations  that  p\p  impacts  the  gap  between  the  two  tests  in  a 
contrasting  way.  The  test  144  is  expected  to  result  in  significant  performance  improvement  in  the  regime 
where  p  is  small,  but  pip  is  neither  too  small  nor  too  large.  In  fact,  this  regime  where  a  a  is  expected 
to  result  in  significant  performance  improvement  is  the  precise  regime  that  is  of  importance  in  practical 
contexts.  This  is  so  because  we  can  expect  the  occurrence  of  disruption  (e.g.,  cracks  in  bridges,  intrusions  in 
networks,  onset  of  epidemics,  etc.)  to  be  a  rare  phenomenon.  Once  the  disruption  occurs,  we  expect  change 
to  propagate  across  the  sensor  array  fairly  quickly  due  to  the  geographical  (network  proximity  in  the  case  of 
computer  networks)  proximity  of  the  other  sensors,  but  not  so  quick  that  the  extreme  case  of  instantaneous 
propagation  is  applicable.  Classifying  the  regime  of  {pe_iy}  and  D(fi,  /o)  where  significant  performance 
improvement  is  possible  with  va  is  ongoing  work.  It  is  also  of  interest  to  come  up  with  better  test  structures 
in  the  regime  where  7/4  does  not  lead  to  a  significant  performance  improvement. 

1.6.  Concluding  Remarks 

We  considered  the  centralized,  Bayesian  version  of  the  change  process  detection  problem  in  this  work  and 
posed  it  in  the  classical  dynamic  programming  framework.  This  formulation  of  the  change  detection  prob¬ 
lem  allows  us  to  establish  the  sufficient  statistics  for  the  DP  under  study  and  a  recursion  for  the  sufficient 
statistics.  While  we  obtain  the  broad  structure  of  the  optimal  stopping  rule  (ropt),  any  further  insights  into  it 
arc  rendered  infeasible  by  the  complicated  nature  of  the  infinite-horizon  cost-to-go  function.  Nevertheless, 
Topt  reduces  to  a  threshold  rule  (denoted  in  this  work  as  744 )  in  the  rare  disruption  regime. 

The  test  744  possesses  the  following  properties  and  thus  serves  as  an  attractive  test  for  practical  applica¬ 
tions  that  can  be  modeled  with  a  change  process:  i)  it  is  of  low-complexity;  ii)  under  certain  mild  sufficient 
conditions  (more  specifically,  if  the  K-L  divergence  D(f\ ,  /o)  is  more  than  a  number  determined  by  the 
parameters  of  the  change  process),  it  is  asymptotically  optimal  in  the  small  PFA  regime;  and  iii)  numerical 
studies  suggest  that  it  can  lead  to  substantially  improved  performance  over  naive  tests.  Nevertheless,  the 
asymptotic  expansion  of  ADD  in  terms  of  log(PFA)  is  not  enough  to  determine  how  small  the  false  alarm 
probability  should  be  in  order  for  this  expansion  and  asymptotic  optimality  of  va  to  hold.  Studies  indicate 
that  PFA  should  be  chosen  significantly  smaller  than  those  needed  for  good  approximations  in  the  simpler 
quickest  detection  problems  solved  earlier  by  the  same  approach. 
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p  =  0.01,  p1  2  =  0.75,  L  = 


p  =  0.01  ,p12  =  0.25,  L  = 


(a) 

p  =  0.01,  Pl  2  =  0.05,  L  =  2 


ADD 

(c) 


ADD 

(b) 

p  =  0.01,  p12  =  0.01,  L  =  2 


(d) 


Figure  10.5:  Probability  of  false  alarm  vs.  Average  detection  delay  for  a  L  =  2  setting  with  different  model 
parameters. 


2.  Data-Efficient  Quickest  Change  Detection  with  On-Off  Observation  Control 
2.1.  Introduction 

In  the  Bayesian  quickest  change  detection  problem  proposed  by  Shiryaev  Shiryaev  [147],  there  is  a  sequence 
of  random  variables,  {Xn},  whose  distribution  changes  at  a  random  time  F.  It  is  assumed  that  before  T, 
{ Xn }  are  independent  and  identically  distributed  (i.i.d.)  with  density  /o,  and  after  F  they  are  i.i.d.  with 
density  /j.  The  distribution  of  F  is  assumed  to  be  known  and  modeled  as  a  geometric  random  variable  with 
parameter  p.  The  objective  is  to  find  a  stopping  time  r,  at  which  time  the  change  is  declared,  such  that  the 
average  detection  delay  is  minimized  subject  to  a  constraint  on  the  probability  of  false  alarm. 

In  this  paper  we  extend  Shiryaev’s  formulation  by  explicitly  accounting  for  the  cost  of  the  observations 
used  in  the  detection  process.  We  capture  the  observation  penalty  (cost)  through  the  average  number  of 
observations  used  before  the  change  point  T,  and  allow  for  a  dynamic  control  policy  that  determines  whether 
or  not  a  given  observation  is  taken.  The  objective  is  to  choose  the  observation  control  policy  along  with  the 
stopping  time  r,  so  that  the  average  detection  delay  is  minimized  subject  to  constraints  on  the  probability  of 
false  alarm  and  the  observation  cost. 
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2.2.  Problem  Formulation  and  the  Two-threshold  Algorithm 

As  in  the  model  for  the  classical  Bayesian  quickest  change  detection  problem  described  in  Section  2.1,  we 
have  a  sequence  of  random  variables  {Xn},  which  are  i.i.d.  with  density  /o  before  the  random  change  point 
T,  and  i.i.d.  with  density  /i  after  F.  The  change  point  T  is  modeled  as  geometric  with  parameter  p,  i.e.,  for 

0  <  p  <  1,  0  <  7To  <  1, 

VTfc  =  P{r  =  k}  =  7T0  ll{fc=0}  +  (1  -  TToM  1  -  p)k~l 

where  11  is  the  indicator  function,  and  ttq  represents  the  probability  of  the  change  having  happened  before 
the  observations  are  taken.  Typically  -kq  is  set  to  0. 

In  order  to  minimize  the  average  number  of  observations  used  before  T,  at  each  time  instant,  a  decision 
is  made  on  whether  to  use  the  observation  in  the  next  time  step,  based  on  all  the  available  information. 
Let  Sk  F  {0, 1},  with  Sk  =  1  if  it  is  been  decided  to  take  the  observation  at  time  k,  i.e.  Xp.  is  available 
for  decision  making,  and  Sk  =  0  otherwise.  Thus,  Sk  is  an  on-off  (binary)  control  input  based  on  the 
information  available  up  to  time  k  —  1,  i.e., 

Sk  —  Atfc-i(4-i)5  k  —  1)  2, . . . 


with  p  denoting  the  control  law  and  I  defined  as: 


4 


Su...,Sk,X 


(s  i) 

i  ) 


( s ■) 

Here,  X)  represents  Xp  if  ,S)  =  1,  otherwise  Xp  is  absent  from  the  information  vector  4-  The  choice  of 
,S’i  is  based  on  the  prior  tto- 

As  in  the  classical  change  detection  problem,  the  end  goal  is  to  choose  a  stopping  time  on  the  observation 
sequence  at  which  time  the  change  is  declared.  Denoting  the  stopping  time  by  r,  we  can  define  the  average 
detection  delay  (ADD)  as 

ADD  =  E  [(r  -  T)+]  . 

Further,  we  can  define  the  probability  of  false  alarm  (PFA)  as 


PFA  =  P(r  <  T). 


The  new  performance  metric  for  our  problem  is  the  average  number  of  observations  (ANO)  used  before  F 
in  detecting  the  change: 

rmin{r,(r-l)}  1 


ANO  =  E 


E  3. 


k=  1 


Let  7  =  {r,p o, ,  pT- i}  represent  a  policy  for  cost-efficient  quickest  change  detection.  We  wish  to 
solve  the  following  optimization  problem: 


minimize  ADD  (7), 

7 

subject  to  PFA(7)  <  a ,  and  ANO(7)  <  /3, 


(10.7) 


where  a  and  (i  are  given  constraints.  Towards  solving  (10.7),  we  consider  a  Lagrangian  relaxation  of  this 
problem  which  can  be  approached  using  dynamic  programming. 

The  Lagrangian  relaxation  of  the  optimization  problem  in  (10.7)  is, 

Rh)  =  min  ADD  (7)  +  Xf  PFA(7)  +  Ae  ANO(7),  (10.8) 

7 

where  Xf  and  Ae  are  Lagrange  multipliers.  It  is  easy  to  see  that  if  Ay  and  Ae  can  be  found  such  that  the 
solution  to  (10.8)  achieves  the  PFA  and  ANO  constraints  with  equality,  then  the  solution  to  (10.8)  is  also  the 
solution  to  (10.7). 
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The  problem  in  (10.8)  can  be  converted  to  an  appropriate  Markov  control  problem  using  steps  similar  to 
those  followed  in  [131]. 

Let  (-)/,  denote  the  state  of  the  system  at  time  k.  After  the  stopping  time  r  it  is  assumed  that  the  system 
enters  a  terminal  state  T  and  stays  there.  For  k  <  t,  we  have  (-)/,.  =  0  for  k  <  F,  and  0/,.  =  1  otherwise. 
Then  we  can  write 


ADD 


E 


V— 1 


5Z  1{©fc=i} 

_fc=0 


and  PFA  =  E[1{©t=0}]. 

Furthermore,  let  Dk  denote  the  stopping  decision  variable  at  time  k,  i.e.,  Dk  =  0  if  k  <  t  and  Dk  =  1 
otherwise.  Then  the  optimization  problem  in  (10.8)  can  be  written  as  a  minimization  of  an  additive  cost  over 
time: 


R( 7)  =  min  E 


^  '  < ?fc(0fc,  Dk, 

_k= 0 


with 

gk(0,d,s)  =  1{6 i^r}  [l{e=i}l{d=o}  +  -V  l{6»=o}l{d=i}  +Ae  l{<9=o}l{s=t}l{d=o}]  • 

Using  standard  arguments  [21]  it  can  be  seen  that  this  optimization  problem  can  be  solved  using  infinite 
horizon  dynamic  programming  with  sufficient  statistic  (belief  state)  given  by: 


Pk  =  p {&k  =  1 1  4}  =  P{r  <  k  |  4}. 


Using  Bayes’  rule,  pk  can  be  shown  to  satisfy  the  recursion 


=  K>(0)(/a)  if  <4+1  =  0 

Pk+1  U(1)(xfe+1,pfc)  ifSfc+i  =  1 


where 


$(0)(Pfc)  =Pk  +  (1  ~Pk)P 


(10.9) 


and 


^\xk+1,Pk) 


&°\pk)L(Xk+1) 

m(pk)L(Xk+1)  +  (1  -  $(°)(pfc)) 


(10.10) 


with  L(Xk+i)  =  j\  (Xk+\ )/ f(\(Xkjr\ )  being  the  likelihood  ratio,  and  po  =  7To-  Note  that  the  structure  of 
recursion  for  pk  is  independent  of  time  k. 

The  optimal  policy  for  the  problem  given  in  (10.8)  can  be  obtained  from  the  solution  to  the  Bellman 
equation: 


J{pk)  =  min  Xf  (l-pk)l{dk=1}+TL{dk=o}  [Pk  +  B0(pk)t{s  =0}  +  (Xe(l  -  pk)  +  Biipk))!^  1}] 

dk,sk+ 1 

(10.11) 

with 

B0(Pk )  =  J(^°\pk)) 

and 

B1(pk)  =  E[J(&1)(Xk+1,pk))}. 

It  can  be  shown  by  an  induction  argument  (see,  e.g.,  [131])  that  J,  Bq  and  B\  arc  all  non-negative  concave 
functions  on  the  interval  [0, 1],  and  that  J(l)  =  i?o(l)  =  4i(l)  =  0.  Also,  by  Jensen’s  inequality 

Bi{p)  <  J(E[$W(X,p)])  =  B0(p),  p  €  [0, 1], 


Let 


d(pk)  =  B0{pk)  -  Bi(pk). 
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Then,  from  the  above  properties  of  J,  Bq  and  B\ .  it  is  easy  to  show  that  the  optimal  policy  7*  =  (r* ,  /Xq,  h\  , . . . ,  //*_  1 ) 
for  the  problem  given  in  (10.8)  has  the  following  structure: 


S*k+ 1  =  (4  (Pfc) 


0  if  d(pk)  <  Ae(l  —  pk) 
1  if  d{pk)  >  Ae(l  -  pk) 


t*  =  inf  {k  >  1  :  pk  >  A*}  . 


(10.12) 


Remark  2.1.  Since,  d(pk)  >  0  V/7.,  the  algorithm  in  (10.12)  reduces  to  the  classical  Shiryaev  test  when 
Ae  =  0  [147], 


The  optimal  stopping  rule  r*  is  similar  to  the  one  of  the  Shiryaev  problem.  But,  the  observation  control 
is  not  explicit  and  one  has  to  evaluate  the  differential  cost  function  d(pk)  at  pk  at  each  time  step  to  decide 
on  the  choice  of  Sk+ 1-  However,  numerical  studies  of  the  Bellman  equation  in  (10.11)  shows  that,  for  most 
choices  of  p,  /q  and  /1,  that  we  have  tried,  the  optimal  algorithm  in  (10.12)  has  the  following  two-threshold 
structure. 


Start  with  po  =  0  and  use  the  following  control,  with  B  <  ,4,  for  A;  >  0: 

'0  if  pk  <  B 


&k+ 1  —  fJ> k{Pk )  —  .  „ 

[1  if  Pk  >  B 

t  =  inf  {k  >  1  :  pk  >  A}  . 

The  probability  pk  is  updated  using  (10.9)  and  (10.10). 


(10.13) 


From  a  practical  point  of  view,  even  if  a  two-threshold  policy  (10.13)  is  not  optimal,  one  would  like  to 
use  the  algorithm  for  the  following  reasons.  First,  the  choice  of  thresholds  uniquely  determine  the  probability 
of  false  alarm  and  the  average  number  of  observations  used  before  change  in  7 (A,  B).  Second,  apart  from 
being  simple,  the  two-threshold  policy  (10.13)  is  asymptotically  optimal.  In  Section  2.3,  we  provide  an 
asymptotic  analysis  of  7 (A,B),  and  use  the  analytical  results  to  show  in  Section  2.9.1  that  7 (A,  B)  is 
asymptotically  optimal  in  the  following  sense.  If 

A  (a,  (5)  =  {7  :  PFA(7)  <  a;  ANO(7)  <  {3}, 


then  for  a  fixed  (3  and  p, 


ADD  (7(A(a,P),B(a,m 


inf  ADD  (7) 

7GA(a,/3) 


(1  -{-  o(l))  as  ol  — y  0. 


Here,  g(x )  =  o(l)  as  x  — >  xo  is  used  to  denote  that  g(x)  — >  0  in  the  specified  limit.  Also  for  each  (3, 
B(a,  f3)  is  the  smallest  B  such  that  ANO(7(A,  B(a,  {3)))  <  (3  as  A  — *  1,  and  A(a,  (3 )  is  such  that,  for  a 
fixed  (3 ,  PFA(7(A(cr,  (3),B(a ,  /?)))  <  a  as  a  — >  0.  The  reason  the  result  is  true  is  because  the  best  possible 
asymptotic  delay  for  the  class  of  algorithms  A(cc,  (3)  is  the  delay  of  the  Shiryaev  test  and  we  show  in  Section 
2.6  that  the  asymptotic  delay  of  7 (A,  B),  for  a  fixed  B  and  p,  converges  to  the  Shiryaev  delay  as  a  -7  0. 

While  the  asymptotic  optimality  ensures  good  performance  of  7 (A,  B)  for  low  values  of  PFA,  it  is 
important  to  know  for  moderate  PFA  values,  how  well  7 (A,  B)  performs  as  compared  to  the  optimal  solution 
of  (10.7).  In  Section  2.9.2,  we  obtain  ANO-ADD  trade-off  curves  for  7 (A,  B),  and  show  that  it  is  possible 
to  achieve  ANO  values  as  low  as  30%  of  E[T]  by  incurring  a  delay  penalty  of  less  than  10%  (Figure  10.7), 
and  without  affecting  the  PFA  values.  Thus,  the  only  case  where  any  other  control  policy  can  significantly 
outperform  7 (A,  B)  is  when  the  PFA  constraint  is  moderate  and  the  ANO  constraint  is  small. 

Finally,  since  7 (A,  B)  uses  the  state  of  the  system  for  observation  control,  we  will  show  in  Section  2.9.3 
that  this  results  in  a  significant  amount  of  reduction  in  the  observation  cost  as  compared  to  the  scheme  where 
the  observations  are  skipped  randomly. 
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In  Section  2.8,  we  will  provide  a  set  of  approximations  using  which  the  asymptotic  expressions  for 
the  two-threshold  algorithm  j(A,  B )  can  be  computed.  This  can  be  used  to  choose  the  thresholds  of  the 
algorithms  that  satisfy  a  given  set  of  constraints  a  and  3.  We  summarize  the  results  and  comment  on  future 
work  in  Section  2.10. 


2.3.  Asymptotic  Analysis  of  7 (A,  B) 

In  this  section  we  derive  asymptotic  approximations  for  ADD,  PFA  and  ANO  for  the  two-threshold  policy 
7(A,  B ).  To  that  end,  we  first  convert  the  recursion  for  pk  (see  (10.9)  and  (10.10))  to  a  form  that  is  amenable 
to  asymptotic  analysis. 

Define,  Z k  =  log  for  k  >  0.  This  new  variable  Zk  has  a  one-to-one  mapping  with  p k.  By  defining 


a  =  log 


A 

1  -  A' 


b 


log 


B 

1  -  B ’ 


we  can  write  the  recursions  (10.9)  and  (10.10)  in  terms  of  Zk. 
For  k  >  1, 


Zk+i  =  Zk  +  logL(Xk+1)  +  |  log(l  -  p)\ -l-log  (l  +  pe  Zk)  ,  if  Zk  e  [6,  a)  (10.14) 

and 

Zk+1  =  Zk  +  |  log(l  -  p)\  +  log  (1  +  pe~Zk)  ,  if  Zk(£  [b,  a) 

with 

Zi  =  log  (eZo  +  p)  +  |  log(l  -  p) |  +  log  (L(X  1))  l{Zoe[bia)}. 

Here  we  have  used  the  fact  that  Sk+i  =  1  if  pk  €  [B,  A),  and  Sk+ 1  =  0  otherwise  (see  (10 
crossing  of  thresholds  A  and  B  by  pk  is  equivalent  to  the  crossing  of  thresholds  a  and  b  by  Zk 
stopping  time  for  7 (A,  B)  (equivalently  7(0,  b)  with  some  abuse  of  notation)  is 

t  =  inf  {k  >  1  :  Zk  >  a}  . 

In  this  section  we  study  the  asymptotic  behavior  of  7 (a,  b)  in  terms  of  Zk,  under  various  limits  of  a,  b 
and  p.  Specifically,  we  provide  two  asymptotic  expressions  for  ADD,  one  for  fixed  thresholds  a,  b,  as  p  — >■  0, 
and  another  for  fixed  b  and  p,  as  a  — >  00.  We  also  provide,  as  a  — >■  00  and  p  — >  0,  an  asymptotic  expression 
for  PFA  for  fixed  b.  Finally,  we  also  provide  asymptotic  estimates  of  the  average  number  of  observations 
used  before  (ANO)  and  after  the  change  point  F.  Note  that  the  limit  of  a  — »•  00  corresponds  to  PFA  going 
to  zero. 

Figure  10.6  shows  a  typical  evolution  of  7 (a,  b),  i.e.,  of  Zk  using  (10.14)  and  (10.15),  starting  at  time 
0.  Note  that  for  Zk  £  [ b ,  a),  recursion  (10.14)  is  employed,  while  outside  that  interval,  recursion  (10.15), 
which  only  uses  the  prior  p,  is  employed.  As  a  result  Zk  increases  monotonically  outside  [ b ,  a). 

From  Figure  10.6  again,  each  time  Zk  crosses  b  from  below,  it  can  either  increase  to  a  (point  r),  or  it 
can  go  below  b  and  approach  b  monotonically  from  below,  at  which  time  it  faces  a  similar  set  of  alternatives. 
Thus  the  passage  to  threshold  a  possibly  involves  multiple  cycles  of  the  evolution  of  Zk  below  b.  We  will 
show  in  Section  2.6  that  after  the  change  point  T,  following  a  finite  number  of  cycles  below  b,  Zk  grows  up 
to  cross  a,  and  the  time  spent  on  the  cycles  below  b  is  insignificant  as  compared  to  r  —  T,  as  a  00.  In 
fact  we  show  that,  asymptotically,  the  time  to  reach  a  is  equal  to  the  time  taken  by  the  classical  Shiryaev 
algorithm  to  reach  a.  (Note  that  for  the  classical  Shiryaev  algorithm  the  evolution  of  Zk  would  be  based  on 
(10.14)). 

When  Zk  crosses  a  from  below,  it  does  so  with  an  overshoot.  Overshoots  play  a  significant  role  in  the 
performance  of  many  sequential  algorithms  (see  [153],  [171])  and  they  arc  central  to  the  performance  of 
7 (a,  b)  as  well.  In  Section  2.5,  we  show  that  PFA  depends  on  the  threshold  a  and  the  overshoot  (ZT  —  a)  as 
a  — >  00,  but  is  not  a  function  of  the  threshold  b. 


(10.15) 


.12)).  The 
.  Thus  the 
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Figure  10.6:  Evolution  of  Zk  for  /q  ~  M(0, 1),  f\  ~  A/"(0.5, 1),  and  p  =  0.01,  with  thresholds  a  =  3.89, 
and  b  =  —1.38,  corresponding  to  the  pk  thresholds  A  =  0.98  and  B  =  0.2,  respectively.  Also  Zq  =  b. 

The  number  of  observations  taken  during  the  detection  process  is  the  total  time  spent  by  Zk  between 
b  and  a.  As  a  — >  oo,  Z/;  crosses  a  only  after  change  point  F,  with  high  probability.  The  total  number  of 
observations  taken  can  thus  be  divided  in  to  two  parts:  the  part  taken  before  T  (ANO),  which  is  the  fraction 
of  time  Zk  is  above  b  (and  hence  depends  only  on  b ),  and  the  part  consumed  after  F.  In  Section  2.7  we  show 
that,  asymptotically,  the  average  number  of  observations  used  after  T  is  approximately  equal  to  the  delay 
itself. 

In  Section  2.8  we  provide  approximations  using  which  the  asymptotic  expressions  can  be  computed 
and  provide  numerical  results  to  demonstrate  that  under  various  scenarios,  for  limiting  as  well  as  moderate 
values  of  a,  b,  and  p,  our  asymptotic  expressions  for  ADD,  PFA  and  ANO  provide  good  approximations.  In 
Section  2.9  we  use  the  asymptotic  expressions  for  ADD,  PFA  and  ANO  to  show  some  optimality  properties 

of  7  (a,b). 

We  begin  our  analysis  by  first  obtaining  the  asymptotic  overshoot  distribution  for  (ZT  —  a)  using  non¬ 
linear  renewal  theory  [153,  186].  As  mentioned  above,  this  will  be  critical  to  the  PFA  analysis. 

In  what  follows,  we  use  E/:  and  P/  to  denote,  respectively,  the  expectation  and  probability  measure 
when  change  happens  at  time  £.  We  use  Eoo  and  to  denote,  respectively,  the  expectation  and  probability 
measure  when  the  entire  sequence  {Xn}  is  i.i.d.  with  density  /q.  Also  recall  that,  g(x)  =  o(l)  as  x  — >•  xq 
is  used  to  denote  that  g(x)  — >  0  in  the  specified  limit. 

2.4.  Asymptotic  Overshoot 

In  this  section  we  characterize  the  overshoot  distribution  of  Z as  it  crosses  a  as  a  — >  oo.  In  analyzing  the 
trajectory  of  Zk,  it  useful  to  allow  for  arbitrary  starting  point  Zq  (shifting  the  time  axis).  We  first  combine 
the  recursions  in  (10.14)  and  (10.15)  to  get: 

Zk+\  =  Zk  +  log-£'(-Xfc+i)  +  |  log(l  —  p)\  +  log  (l  +  e  Zk p)  . 

By  defining  \\  =  log  L(.A/,.)  +  |  log(l  —  p)  \  and  expanding  the  above  recursion,  we  can  write  an  expression 
for  Zn  \ 

n  7i—l  n 

Zn  =  Yk  +  l0g  (eZ°  +  p)  +  l0g  +  e~Zkp )  ~  X]  1iZk<b}  loS  L(Xk) 

k=  1  k= 1  k= 1 

n 

=  Yk  +  rtn ■  (10.16) 

k= 1 
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Here  //,,  is  used  to  represent  all  terms  other  than  the  first  in  the  equation  above: 

n—  1  n 

r\n  =  log  (ez°  +  p)  +  ^2  log  (1  +  e~Zkp)  -  ^  U{ Zk<b }  log  L{Xk).  (10.17) 

k= 1  fc=l 

As  defined  in  [153],  rjn  is  a  slowly  changing  sequence  if 

vT1  max{|r/i|, . . . ,  |??n|}  — 0,  (10.18) 

i.p. 

and  for  every  e  >  0,  there  exists  n*  and  5  >  0  such  that  for  all  n  >  n* 

P{  max  \r]n+k  -  rjn\  >  e}  <  e.  (10.19) 

1  <k<no 

If  indeed  { ?/,, }  is  a  slowly  changing  sequence,  then  the  distribution  of  Zr  —  a,  as  a  — >  oo,  is  equal  to  the 
asymptotic  distribution  of  the  overshoot  when  the  random  walk  i  Yk  crosses  a  large  positive  boundary. 
We  have  the  following  result. 

Theorem  2.1.  Let  R(x)  be  the  asymptotic  distribution  of  the  overshoot  when  the  random  walk  Y^k=i 
crosses  a  large  positive  boundary  under  P\.  Then  for  fixed  p  and  b,  under  P\,  we  have  the  following: 

1.  {pn\  is  a  slowly  changing  sequence. 

2.  R(x)  is  the  distribution  of  ZT  —  a  as  a  -X  oc,  i.e., 

lim  Pi  [ZT  —  a  <  x\t  >  P]  =  R{x).  (10.20) 

a— >cc 


2.5.  PFA  Analysis 

We  first  obtain  an  expression  for  PFA  as  a  function  of  the  overshoot  when  Zk  crosses  a. 
Lemma  2.1.  For  fixed  p  and  b, 

PFA  =  E[1  -pT\  =  e~aE[e-{ZT-a)\r  >  r](l  +  o(l))  as  a  ->  oo. 


From  Lemma  2. 1,  it  is  evident  that  PFA  depends  on  the  overshoot  when  Zk  crosses  a  as  a  — >  oc.  Since 
the  overshoot  has  an  asymptotic  distribution  (Theorem  2.1)  that  depends  only  on  densities  /o,  /i  and  prior 
p,  and  is  independent  of  b,  it  is  natural  to  expect  that  as  a  oc,  PFA  is  completely  characterized  by  the 
asymptotic  distribution  R(x)  and  is  not  a  function  of  the  threshold  b.  This  is  indeed  true  and  is  established 
in  the  following  theorem. 


Theorem  2.2.  For  a  fixed  b  and  p, 
PFA  (7(a,b)) 


(1  +  o(l))  as  a  — >  oo. 


2.6.  Delay  Analysis 

The  PFA  for  7 (a,  b )  have  the  following  bound: 

PFA  =  Ell  —  pTl  <  1  —  A  =  — 1 —  <  e~a. 

1  F  1  ~  1  +  ea  ~ 

Using  this  upper  bound  we  can  show  that  the  ADD  of  y(a,  b)  is  given  by: 


(10.21) 


(10.22) 


ADD  =  E  [(r  -  r)+] 

=  E[t  —  r|T  >  r] (1  +  o(l))  as  a  — >•  00.  (10.23) 
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In  the  following  we  provide  two  different  expressions  for  E[r  —  r|r  >  T],  The  first  one  is  obtained  by 
keeping  b  fixed  and  taking  p  — »  0.  This  expression  will  be  used  to  get  accurate  delay  estimates  for  7 (a,  b) 
in  Section  2.8 

Next,  we  will  provide  another  asymptotic  expression  for  E[r  —  T|r  >  T]  for  a  fixed  b,  p  and  as  a  -»  00. 
We  show  that  in  this  limit,  E[r  —  T|r  >  T]  converges  to  the  Shiryaev  delay.  This  fact  will  be  used  to  prove 
the  asymptotic  optimality  of  7 (a,  b)  in  Section  2.9. 

It  was  discussed  in  reference  to  Figure  10.6  that  each  time  Z/,.  crosses  b  from  below,  it  faces  two  alter¬ 
natives,  to  cross  a  without  ever  coming  back  to  b  or  to  go  below  b  and  cross  it  again  from  below.  It  was 
mentioned  that  the  passage  to  the  threshold  a  is  through  multiple  such  cycles.  Motivated  by  this  we  define 
the  following  stopping  times  A  and  A: 

A  =  inf  {A;  >  1  :  Zk  (£  [b,  a),  Z0  =  b},  (10.24) 

and 

A  =  inf  {A;  >  1  :  Zk  >  a  or  3  k  s.t.  Z^-i  <  b  and  Zj.  >  b  ,  Zq  =  b}.  (10.25) 

Let  t(x,  y)  be  the  constant  time  taken  by  Zy .  to  move  from  Z$  =  x  to  y  using  the  recursion  (10.15),  i.e. 

t(x,  y)  =  inf{fc  >  0  :  Zk  >  y,  Z0  =  x,  x,y  £  [b,  a)}.  (10.26) 

Then,  we  can  write  A  as  a  function  of  A  using  (10.26): 


A  —  (A  +  t(Z\,  +  A  1  {zx>a}  —  A  +  t(Z\,  b)l{zx<b}- 


The  significance  of  these  stopping  times  is  as  follows.  If  we  start  the  process  at  Zo  =  b  and  reset  Z/,  to  b 
each  time  it  crosses  b  from  below,  then  the  time  taken  by  Z/,:  to  move  from  b  to  a  is  the  sum  of  a  finite  but 
random  number  of  random  variables  with  distribution  of  A,  say  Ai,  A2, . . . ,  Ajv-  For  i  =  1, . . . ,  N  —  1, 


Z Ai  <  b,  and  Z\N  >  a.  Thus  the  time  to  reach  a  in  this  case  is  Ei  Ylk= 1 


Let 


ADD8  =  Ei 


‘  N 


Dc=l 


The  behavior  of  the  delay  path  depends  on  Zp.  the  value  of  Z^  at  the  change  point  F,  and  how  Z^ 
evolves  after  that  point.  We  use  {Z/,.  /*  b}  to  indicate  that  Z/,  approaches  b  from  below  for  some  k  >  F,  i.e. 
3k  >  T,  s.t..,  Z];_  ]  <  b.  Z\,  >  b.  and  use  { Z/.  a}  to  represent  the  event  that  Z/,.  crossed  a  without  ever 

coming  back  to  b,  i.e.,  Z^  >  b,  V/r  >  F.  We  define  the  following  three  disjoint  events: 

A  =  {Zr<b}, 

B  =  {Zr  >  b-  Zk  b}, 

C  =  {Zr  >  b;Zk  yZ  a}. 

Thus,  under  the  event  A,  the  process  Z/,  starts  below  b  at  T,  and  reaches  a  after  multiple  up-crossings  of 
the  threshold  b.  Under  the  event  B ,  the  process  Z/,  starts  above  b  at  F,  and  crosses  b  before  a.  It  then  has 
multiple  up-crossings  of  b,  similar  to  the  case  of  event  A.  Under  event  C,  the  process  Z/,.  starts  above  b  at  T, 
and  reaches  a  without  ever  coming  below  b. 

Also  define, 

X(x)  =  inf  {A;  >  1  :  Z&  ^  [b,  a),  Z$  =  x,b  <  x  <  a},  (10.27) 

and  let  A(x)  be  defined  with  Z(,  =  x  similar  to  (10.25).  Thus,  A  and  A (6)  have  the  same  distribution. 

Similarly,  A  and  A(6)  arc  identically  distributed. 

The  following  theorem  gives  an  asymptotic  expression  for  the  conditional  delay. 
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Theorem  2.3.  For  a  fixed  values  of  the  thresholds  a ,  b,  the  conditional  delay  is  given  by 


e[t  —  r|r  >  r] 


ADD5  F{AuB\t  >  T) 


+  E[A(Zr)|C,r  >  T]  P(C|r  >  T) 

+  E[t(Zr,b)\A,T>T}  F{A\t  >  F) 


+  E[A(Zr)\t3,T>T]  F(B\t>T) 


(l  +  o(l))  as  p  —y  0. 


(10.28) 


In  Section  2.8  we  will  provide  approximations  for  various  terms  in  (10.28)  to  get  an  accurate  estimate 
of  ADD.  In  Lemma  2.2  we  provide  expressions  for  ADD5. 

Let  represent  the  Shiryaev  recursion,  i.e.,  updating  Zk  using  only  (10.14).  Define 


v{x,y)  =  inf  {k  >  1  :  ^{Zk_ i)  >  y,  Z0  =  x}  .  (10.29) 

Thus,  u(x,  y)  is  the  time  for  the  Shiryaev  test  to  reach  y  starting  at  x.  Also,  define  the  stopping  times: 


Ob  =  o(b,a),  (10.30) 

and 

uq  =  u(—oo,  a).  (10.31) 

Note  that,  uq  is  the  stopping  time  for  the  classical  Shiryaev  test  [147]  and  iq,  is  its  modified  form  which  stalls 
at  b.  We  have  the  following  asymptotic  expression. 


Lemma  2.2.  For  a  fixed  b  and  p,  ADD5,  the  average  time  for  Zk  to  cross  a  starting  at  b,  under  Pi,  with  Zk 
reset  to  b  each  time  it  crosses  bfrom  below,  is  given  by 


ADD5 


Ei [A]  +  E1[t(Zx,b)\{Zx  <  bjjF^Zx  <  b ) 
Pi(ZA  >  a) 


(10.32) 


and  is  asymptotically  equal  to  the  time  taken  by  the  Shiryaev  algorithm  to  move  from  b  to  a,  i.e., 


ADD5  =  Ei[z/b](l  +  o(l))  as  a  —y  oo.  (10.33) 

Note  that  Theorem  2.3  takes  p  —y  0.  We  now  provide  another  expression  for  E[r  —  r|r  >  F],  for  a  fixed 
b  and  p  as  a  —y  oo,  which  will  be  used  to  prove  the  asymptotic  optimality  of  7 (a,  b)  in  Section  2.9. 

Theorem  2.4.  For  a  fixed  b  and  p,  we  have  as  a  -y  00 


E[t  -  T|r  >  T]  <  ADD5  (1  +  o(l)) , 


(10.34) 


and  hence,  we  have 

(l  +  o(l))  as  a  — y  00,  (10.35) 

where,  D(j\,  /o)  is  the  K-L  divergence  between  /o  and  f\. 


e[t  —  r|r  >  r]  = 


L D(fijo)  +  log(l  -  p)  \  I 


194 


Final  Technical  Report  ARO  MURI  Grant  #  W91  INF-06- 1-0094:  Spatio-Temporal  Nonlinear  Filtering  with  Applications  to  Information  Assurance  and  Counter  Terrorism 


2.7.  Computation  of  ANO 

First  note  that. 


ANO  =  E 


=  E 


min{r,r— 1} 

E 

k= 1 


r-i 


=  E 


Es* 

k=  1 

X> 


Lfc=i 


r  >  r 


t  >  r 


P(r  >  r)  +  E 


Es‘ 


r  <  r 


Lfc=i 

(1  +  o(l))  as  a  — >  oo. 


P(r  <  r) 


The  last  equality  follows  because  +  P  on  {T  <  T},  and  P(r  <  T)  <  e  a  — )•  0  as  a  — >  oo. 

Following  (10.24),  we  define 


A  =  inf{/c  >  1  :  Z\~  <  b,  Zq  =  b,  a  =  oo}. 


(10.36) 


The  theorem  below  an  gives  asymptotic  expression  for  ANO. 
Theorem  2.5.  For  fixed  b,  we  have  as  a  — >  oo,  and  as  p  ^  0, 


ANO 


Eoo  [A]  1 

Poo[r<A  +  f(Zy,6)]l  +  eb 


where,  A  is  as  defined  in  (10.36). 

Proof.  Let  t{b)  be  the  first  time  Zj.  crossed  b  from  below,  i.e.,  t(b)  =  Uzq.  b).  Using  the  fact  that  observa¬ 
tions  are  used  only  after  t{b),  we  can  write  the  following: 


ANO  =  E 


r-i 

Es* 

Lk=i 


t  >  r 


=  E 


E 


Sk 


r  >  t(b),r>  r 


p(r  >  t(b)\r  >  r). 


(10.37) 


|_fc=t(b)  J 

We  now  compute  each  of  the  two  terms  in  (10.37).  For  the  first  term  in  (10.37),  we  have  the  following 
lemma. 

Lemma  2.3.  For  a  fixed  b,  as  a  — >  oo,  p  — »•  0, 


E 


r-i 


E  * 

k=t(b) 


r  >  t(b),r  >  r 


Eoo  [A] 


Poo[r  <  \  +  t{Z;,b)\ 


(1+0(1)). 


For  the  second  term  in  (10.37),  we  show  that  P(T  >  t(b)\r  >  F)  is  equal  to  in  the  limit  and  is 
independent  of  zo. 

Lemma  2.4. 

P(r  >  t(b)\r  >  r)  =  - - ^  +  o(l)  as  a  — >  oo,  p  — >  0. 

1  — t-  e 

The  Lemmas  2.3  and  2.4  taken  together  completes  the  proof  of  Theorem  2.5.  □ 
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Define, 


ANOi  =  E 


Lfc=r 


r  >  r 


Thus,  ANOi  is  the  average  number  of  observations  used  after  the  change  point  T.  In  some  applications 
it  might  be  of  interest  to  have  an  estimate  of  ANOi  as  well.  The  following  theorem  shows  that  ANOi  is 
approximately  equal  to  the  delay  itself. 


Theorem  2.6.  For  fixed  b  and  p,  we  have 


ANOi  =  Ei[i/j,](l  +  o(l)),  as  a oo. 


2.8.  Approximations  and  Numerical  Results 

In  Sections  2.5-2.1,  we  have  obtained  asymptotic  expressions  for  ADD,  PFA,  and  ANO  as  a  function  of  the 
system  parameters:  the  thresholds  a,  b,  the  densities  /o  and  /i,  and  the  prior  p.  In  this  section  we  provide 
approximations  for  various  analytical  expressions  obtained  in  these  sections.  The  observations  used  are 
Gaussian  with  /o  ~  AA(0, 1),  and  /i  ~  A f(8, 1),  8  >  0,  for  the  simulations  and  analysis.  In  the  simulations, 
the  PFA  values  are  computed  using  the  expression  E  [1  —  pT\.  This  guarantees  a  faster  convergence  for  small 
values  of  PFA. 

2.8.1.  Numerical  results  for  PFA 

By  Theorem  2.2,  we  have  the  following  approximation  for  PFA: 

/»oo 

PFA  ss  e~a  /  e~xdR(x). 

Jo 

We  note  that  /0°°  e~xdR(  x )  and  r  can  be  computed  numerically,  at  least  for  Gaussian  observations  [153]. 
In  this  section  we  provide  numerical  results  to  show  the  accuracy  of  the  above  expression  for  PFA. 

In  Table  10.1  we  compare  the  analytical  approximation  with  the  PFA  obtained  using  simulations  of 
7 (a,  b)  for  various  choices  of  p,  thresholds  a,  b,  and  post  change  mean  8.  From  the  table  we  see  that  the 
analytical  approximation  is  quite  good. 


Table  10.1:  PFA:  for  fQ  ~  ff(Q, 1),  fx  ~  M(0, 1) 


8 

P 

a 

b 

PFA 

Simulations 

PFA 

Analysis 

0.4 

0.01 

3.0 

0 

3.78  xl0~2 

3.94xl0-2 

0.4 

0.01 

6.0 

2.0 

1.955xl0~3 

1.96xl0-3 

0.75 

0.01 

9.0 

-2.0 

7.968xl0~5 

7.964xl0-5 

2.0 

0.01 

5.0 

-4.0 

2.15xl0-3 

2.155x  10-3 

0.75 

0.005 

7.6 

3.0 

3.231xl0-4 

3.235xl0-4 

0.75 

0.1 

4.0 

-3.0 

1.143xl0-2 

1. 157xl0-2 

In  Table  10.2,  we  show  that  PFA  is  not  a  function  of  b  for  large  values  of  a.  We  fix  a  =  4.6,  and  increase 
b  from  -2.2  to  0.85.  We  notice  that  PFA  is  unchanged  in  simulations  when  b  is  changed  this  way.  This  is 
also  captured  by  the  analysis  and  it  is  quite  accurate. 

2.8.2.  Approximations  and  Numerical  Results  for  ANO  and  ANOi 

We  recall  the  expressions  for  ANO  from  Theorem  2.5  and  for  ANOi  from  Theorem  2.6: 


ANO 

ANOi 


Eoo  [A]  1 

Poo[r  <  X  +  t(Zx,b)\  1  +  efe 
Ei  [z vb\. 
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Table  10.2:  PFA  for  p  =  0.01,  /o  ~  //(0, 1),  fi  ~  AA(0.75, 1) 


a 

b 

Simulations 

Analysis 

4.6 

-2.2 

6.44  x  10-3 

6.48  xlO-3 

4.6 

-1.5 

6.44  xlO"3 

6.48  xlO"3 

4.6 

-0.85 

6.44  x  10-3 

6.48  xlO-3 

4.6 

0 

6.44  xl0“3 

6.48  xlO-3 

4.6 

0.85 

6.44  xlO"3 

6.48  xlO"3 

We  first  simplify  the  expression  for  ANO.  Note  that 

Poo [r  <  A  +  t(zx,  b )]  =  1  —  Poo [r  >  A  +  t{zx,  6)] 

=  1-E00[(1-p)a+*(zaA)]. 

Thus,  using  Binomial  approximation  we  get 


Thus,  we  have 


Poo[P  <  A  +  t(Zx,  b )]  ~  p  (EgofA]  +  Eoc[t(Zc ,  b 


ANO 


p  1  Eqq  [A] _ 1 

E00[X]+E00[t(Zx,b)}l  +  eb 


(10.38) 


We  now  provide  approximation  to  compute  E^fA]  and  E0 0[t(Z;,b)\  in  (10.38).  Invoking  Wald’s  lemma 
[153],  we  write  EoofA]  as, 

F  rCn  _  Eqq  [ZjJ  ~  Eqq  [r?j^] 

°°[  J  -£>(/i,/0)  +  iiog(i-p)r 

We  have  developed  the  following  approximation  for  E^  [A] : 

f  +  log(l  +  pe~b) 


Eoo[A] 


- 1  iog(i  -  p)  r 


(10.39) 


Here,  log(l  +  pe  ’)  is  an  approximation  to  E-^  [r/;  ]  by  ignoring  all  the  random  terms  after  b  is  factored  out 
of  it.  This  extra  b  will  cancel  with  the  b  in  E^  "/AJ  =  b  +  Eoc  [ZA  —  b\.  We  approximate  E.^  [/;  —  Z-A  by  f, 
the  mean  overshoot  of  the  random  walk  Pfe,  with  mean  D(f\,  /o)  —  |  log(l  —  p) |,  when  it  crosses  a 
large  boundary  (see  (10.16)).  For  the  term  Eo Q[t(Zx,  b)],  we  have  obtained  the  following  approximation: 

log(l  +  eb )  —  log(l  +  eb~x ) 


Eoo  [t(Zx,b)\ 


log(l  p)  | 


-dR(x). 


(10.40) 


Thus,  we  approximate  the  distribution  of  (6  —  Z-A  by  R(x). 

Based  on  the  second  order  approximation  for  Ei  [z/o]  developed  in  [171],  we  have  obtained  the  following 
approximation  for  Ei  \ip\ : 

a  —  E[?7(6)]  +  f 


Ei  [vb\  = 


+  o(l)  as  a  — >  oo, 


D(fiJo)  +  |  log(l  -  p)  | 

where,  77(6)  is  the  a.s.  limit  of  the  slowly  changing  sequence  //,,  with  Zq  =  b  under  /1,  (see  (10. 17))and 


(10.41) 


r  = 


xdR(x), 


(10.42) 


with  R(x)  as  in  Theorem  2.1. 

In  Table  10.3  we  demonstrate  the  accuracy  of  approximations  for  ANO  and  ANOi,  for  various  values 
of  p,  thresholds  a,  b,  and  post  change  mean  6.  The  table  shows  that  the  approximations  arc  quite  accurate 
for  the  parameters  chosen. 
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Table  10.3:  /0  ~  AA(0, 1),  fi  ~  A f(9, 1) 


ANO 

ANOi 

6 

P 

a 

b 

Simulations 

Analysis 

Simulations 

Analysis 

0.4 

0.01 

8.5 

-2.2 

66.3 

62.88 

102.9 

111.7 

0.75 

0.01 

6.467 

-2.2 

34.92 

34.24 

27.86 

29.46 

2.0 

0.01 

7.5 

-4.0 

42.94 

46.4 

6.08 

6.23 

0.75 

0.005 

8.7 

-3.0 

77.18 

75.09 

38.73 

40.38 

0.75 

0.1 

8.5 

0.0 

2.64 

3.2 

21.17 

22.18 

2.8.3.  Approximations  and  Numerical  Results  for  ADD 
Theorem  2.4  gave  a  first  order  approximation  for  E[r  —  T|r  >  T]: 


E[r  —  T|r  >  T] 


a 

MhJo)  +  I  log(l  -p)|.  ‘ 


Note  that,  from  [171],  this  is  also  the  first  order  approximation  for  ADD  of  the  Shiryaev  algorithm,  and 
gives  a  good  estimate  of  the  delay  when  PFA  is  small.  For  the  Shiryaev  delay,  a  second  order  approximation 
was  developed  in  [171]  (also  see  (10.41)): 


Ei[i/0] 


a  —  E[r/(— oo)]  +  r 
/o)  +  |  log(l  —  p)\_ 


+  o(l)  as  a  — )•  oo. 


So,  instead  of  using  /0)+jiog(i^p)| «  we  ProPose  to  use  the  following: 


e[t  —  r|r  >  r] 


a  —  E[r/(— oo)]  +  r 

.£(/i,/o)  +  |log(l-p)|.  ' 


(10.43) 


For  the  Shiryaev  algorithm,  (10.43)  provides  a  very  good  estimate  of  the  delay  even  for  moderate  values 
of  PFA.  In  case  of  7 (a,  b),  the  accuracy  of  (10.43)  depends  on  the  choice  of  b  and  hence  on  the  constraint  f3, 
as  having  b  >  —00  increases  the  delay.  Before  we  demonstrate  this  by  numerical  and  simulation  results  we 
introduce  the  following  concept: 


ANO%  =  ANO  expressed  as  a  percentage  of  ET],  (10.44) 

For  example,  if  p  =  0.05,  and  for  some  choice  of  system  parameters  ANO  =  15,  thenANO%  =  15*0.05  = 
75%.  Thus,  the  concept  of  ANO%  captures  the  reduction  in  the  average  number  of  observations  used  before 
change  by  employing  7 (a,  b). 

In  Table  10.4  we  provide  various  numerical  examples  where  (10.43)  is  a  good  approximation  for  E[r  — 
r|r  >  r].  Since,  (10.43)  is  a  good  approximation  for  the  Shiryaev  delay  as  well,  it  follows  that,  for  these 
parameter  values,  the  delay  of  7 (a,  b)  is  approximately  equal  to  the  Shiryaev  delay.  It  might  be  intuitive 
that  if  we  are  aiming  for  large  ANO%  values  of  say  90%,  then  the  delay  will  be  close  to  the  Shiryaev  delay. 
But  values  in  Table  10.4  shows  that  it  is  possible  to  achieve  considerably  smaller  values  of  ANO%  without 
significantly  affecting  the  delay. 

However,  if  the  ANO%  value  is  small,  then  this  means  that  the  value  of  b  is  large,  and  this  implies  that 
the  delay  is  large.  In  this  case,  it  might  happen  that  (10.43)  is  a  good  approximation  only  for  values  of  PFA 
which  arc  very  small.  This  is  demonstrated  in  Table  10.5.  It  is  clear  from  the  table  that,  for  the  parameter 
values  considered,  estimating  the  delay  with  less  than  10%  error  is  only  possible  at  PFA  values  of  the  order 
of  PFA  «  10“22. 

This  motivates  the  need  for  a  more  accurate  estimate  of  the  delay.  This  is  provided  below. 
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Table  10.4:  f0  ~  A fjO,  1),  fi  ~  A f(9, 1) 


ADD 

PFA 

ANO% 

9 

P 

a 

b 

Simulations 

E[r  —  r  t  >  r] 

Analysis 

(10.43) 

Simulations 

Analysis 

0.4 

0.01 

8.5 

-2.2 

104.9 

111.7 

1.608xl0-4 

1.608X10"4 

66% 

0.75 

0.01 

6.467 

-2.2 

32.3 

29.5 

1.002xl0-3 

1.004x  10-3 

35% 

2.0 

0.01 

7.5 

-4.0 

6.1 

6.23 

1.77xl0“4 

1.768xl0^4 

43% 

0.75 

0.005 

8.7 

-3.0 

42.6 

40.4 

1.076xl0-4 

1.076xl0~4 

77% 

0.75 

0.1 

8.5 

0.0 

23.9 

22.18 

1.286xl0-4 

1.285xl0-4 

26% 

Table  10.5:  p  =  0.05,  f0  ~  Af{0, 1),  /i  ~  AA(0.75, 1) 


a 

b 

Simulations 

e[t  —  r  r  >  r] 

Analysis 

(10.43) 

ANO% 

PFA 

5.0 

1.0 

30 

13 

7.5% 

4.3  x  10~3 

9.0 

1.0 

42 

25 

7.5% 

7.9  x  10"5 

13.0 

1.0 

54 

37 

7.5% 

1.4  x  10-6 

18.0 

1.0 

69 

52 

7.5% 

9.7  x  10“9 

50.0 

1.0 

165 

149 

7.5% 

1.23  x  10-22 

From  Theorem  2.3,  recall  that  we  had  the  following  three  events: 

A  =  {Zr<b}, 

B  =  {Zr>b;ZkSb}, 

C  =  {Zr  >  b;Zk  /Z  a}. 

As  a  first  step  towards  the  approximations,  we  ignore  the  event  B:  P (B)  ~  0.  That  is,  we  assume  that  if 
Zr  >  b,  then  Zk  climbs  to  a.  Define, 

A  =  P(^r  >  b\r  >  T). 


Then  (10.28), 


E[r  -  T|r  >  T]  «  Pb  E[A(Zr)|C,  r  >  T]  +  (1  -  Pb)(E[t(Zr,  b)\A,  r  >  T]  +  ADDS).  (10.45) 


From  Lemma  2.2,  it  is  easy  to  show  the  following: 

ADD5  =  Ei[\\{Zx  >  a}]  +  (E![A|{Za  <  b}}  +  Ex[f(ZA, b)\{Zx  <  6}]) 
We  now  use  the  following  approximations: 


Ei[A|{ZA  >a}] 
E![A|{Za  <  6}] 
E]_[t(ZA,  b)\{Z\  <  b}] 


E[A(Zr)|C,  r  >  r] 


a  —  E[r/(— cx))]  +  r 
D(/i,/o)  +  |log(l  -  p) 


r  +  log(l  +  pe  b ) 
D(/i,/o)  -  |log(l-p)|’ 


t(b  —  r,  b) 


log(l  +  eb)  -  log(l  +  eb  r) 
I  log(l  p) | 


To  compute  (10.45),  we  also  need  approximations  for  P]  [Z\  <  b ),  Pb  and  E[f(Zp,  b)\A\.  Those  arc  pro¬ 
vided  below.  Setting  a  =  oo  we  have,  by  Wald’s  likelihood  identity.  Proposition  2.24,  Pg  13,  [153], 


Pi(ZA  <  b)  =  Eoo 


7i  (x1)...f1(xxy 

_fo(Xi) . . .  fo(X\)_  • 
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Under  P^,  A  a.s.  ends  in  b  and  with  high  probability  it  takes  very  small  values.  Hence,  this  expressions  can 
be  computed  using  Monte  Carlo  simulations.  Further, 


Pb  =  P(T  >  t(-oo,&))P(Zr  >  b\T  >  t{-oo,b),r  >  T) 

_  _ 1 _ Eoo[A] _ 

1  +  ebEO0 [A]  +E00[t{Zx,  &)]' 

We  already  have  the  approximations  for  A  and  Eqq [t(Zi,b)\  from  Section  2.8.2.  The  approximation 
for  E[f(Zp,  b)\J[  can  be  obtained  as  follows  (all  expectations  conditioned  on  {t  >  r}): 


(l-Pb)E[t(Zr,b)\A\ 


(l-Pb)E[t(Zr,b)\{Zr  <b}} 

E[t(Zr,  b)\{Zr  <  b}  n  {r  >  t(-oo,  6)}]P({r  >  f(-oo,  b)}  n  {Zr  <  b }) 
+E[*(Zr,  b)\{zr  <b}n{T  <  t(- oo,  6)}]P({r  <  f(-oo,  6)}  n  {Zr  <  b}). 


This  can  be  computed  using 


p({r>f(-oo,6)}n{zr<6}) 


1 _ E00[t(Zy,b)] 

l  +  ehE^lXl+E^itiZ^b)}’ 


and 

P({r  <  t(-oo,b)}  n  {zr  <  b })  =  P({r  <  *(-<»,&)}) «  r^. 

To  compute  conditional  expectation  of  t(Zp,  b),  we  need  to  subtract  from  t(x.  b),  the  mean  of  V  conditioned 
on  {T  <t(x,b)}.  Specifically, 


E[f(Zp,  6)|{^r  <  b}  Cl  {r  >  t(— oo,  6)}]  =  t(b  —  r,  b) 


1 

p(r  <  t{b 


r,b )) 


t(b—r,b) 

E  Hi-pf-'p, 


k=  1 


and, 

t(—oo,b) 

E[t(Zr,  b)\{Zv  <b}(l{T  <  t(- oo,  b)}]  =  t{- oo,  b )  -  1 - —  E  k(1  ~  pf^P- 

P(P  <  t(-oo,b))  ^ 


Thus  we  have  obtained  approximations  for  all  the  terms  for  the  new  approximation  for  E  r  —  P|r  >  F]  in 
(10.45). 

In  Table  10.6,  we  now  reproduce  Table  10.5  with  a  new  column  containing  delay  estimates  computed 
using  the  new  ADD  (for  E[r  —  r|r  >  I  )  approximation  (10.45).  The  values  shows  that  all  estimates  arc 
nearly  within  10%  of  the  actual  value. 

In  Table  10.7,  we  show  the  accuracy  of  the  new  ADD  approximation  (10.45),  for  various  values  of 
the  system  parameters,  by  comparing  it  with  simulations  and  also  with  (10.43).  We  also  set  PFA  around 
1  x  10”3.  The  table  clearly  demonstrates  that  the  new  ADD  approximation  predicts  ADD  with  less  than 
10%  error. 


2.9.  Asymptotic  Optimality  and  Performance  of  7 (a,  b) 

2.9.1.  Asymptotic  Optimality  0/7(0,  b) 

In  Theorem  2.4  we  saw  that  for  a  fixed  b  and  p, 

(1  +  o(l))  asa-t  00. 

We  recall  that  from  Tartakovsky  and  Veeravalli  [171],  this  is  also  the  asymptotic  delay  of  the  Shiryaev 
algorithm. 


E[r-r|r  >  r]  = 


\D(fi,f0)  +  |  log(l  -  p) IJ 
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Table  10.6:  p  =  0.05,  fQ  ~  Af{0, 1),  /i  ~  AA(0.75, 1) 


a 

b 

Simulations 

e[t  —  r  r  >  r] 

Analysis 

(10.43) 

New  Analysis 

ADD  from  (10.45) 

ANO% 

PFA 

5.0 

1.0 

30 

13 

34 

7.5% 

4.3  x  10~3 

9.0 

1.0 

42 

25 

46 

7.5% 

7.9  x  10-5 

13.0 

1.0 

54 

37 

58 

7.5% 

1.4  x  10"6 

18.0 

1.0 

69 

52 

73 

7.5% 

9.7  x  10-9 

50.0 

1.0 

165 

149 

169 

7.5% 

1.23  x  10"22 

Table  10.7:  fp  ~  AA(0, 1),  fi  ~  AA(0.75, 1),  PFA  sa  10~3,  ANQ=10%  of  Shiryaev  ANO 


ADD 

P 

a 

b 

Simulations 

Analysis 

New  (10.45) 

Analysis 

(10.43) 

ANO% 

0.01 

6.4 

2.7 

250 

260 

14.42 

0.33% 

0.005 

6.45 

0.6 

181 

190 

22.09 

1.5% 

0.001 

6.47 

-2.7 

75 

80 

33.68 

7.6% 

0.0005 

6.47 

-3.49 

74 

79 

36.49 

8.4% 

0.0001 

6.47 

-5.2 

76 

80 

42.56 

9.6% 

Moreover,  from  Theorem  2.2,  the  PFA  for  7 (a,  6)  is 

PFA  =  ^e-a  J  e~xdR(x)\  (1  +  o(l))  as  a  — >  00. 

Again  from  Tartakovsky  and  Veeravalli  [171],  this  is  the  PFA  for  the  Shiryaev  algorithm.  We  thus  have  the 
following  asymptotic  optimality  result  for  7 (a,  b). 

Theorem  2.7.  With  7  =  {r,  Si, . . . ,  ST}  define 


A (a,fi)  =  {7  :  PFA(7)  <  a;  ANO(7)  <  /?}, 


then  for  a  fixed  fi  and  p, 


ADD(7 {a{a,  fi),b(a,  fi))) 


inf  ADD(7) 

7GA  (a, ft) 


(1  +  o(l))  as  a  — >  0. 


(10.46) 


Here,  for  each  a,  fi,  b(a ,  /?)  A  t/ie  smallest  b  such  that  ANO(7(a(a,  fi),  b(ot ,  /3)))  <  fi  as  a  — »•  00. 


Proof  Fix  b  such  that  ANO(7(a,  b))  <  fi  as  a  -X  00.  It  may  happen  that  the  constraint  fi  is  not  met  with 
equality.  Thus  choose  the  smallest  b  which  satisfies  the  constraint  fi  as  a  -»  00.  This  choice  of  threshold  b 
is  unique  for  a  given  fi  because  ANO  is  not  a  function  of  threshold  a  as  a  — >  oc. 

As  a  — >  00,  the  PFA  and  ADD  both  approach  the  Shiryaev  PFA  (10.21)  and  Shiryaev  delay  (10.35), 
respectively.  Thus,  as  a  — »•  00,  7  (a,  b)  is  optimal  over  the  class  of  all  control  policies  A  (a,  fi)  that  satisfy 
the  constraints  a  and  fi.  □ 


Remark  2.2.  If  we  select  b\  such  that  ANO%  <  1%  as  a  —>  00,  and  then  select  aq  such  that  ADD(7(&i,  ai)) 
is  within  1%  of  f(,)+|  iog(i-P)| '  ^en  ^or  thresholds  d  >  a\,  7 (d,  b\)  has  ANO%  <  1%  and  delay 
within  1%  of  the  Shiryaev  delay.  Thus,  for  small  values  of  PFA,  as  long  as  we  arc  aiming  for  ANO%  of 
1-100,  no  other  control  policy  can  outperform  the  two-threshold  policy  7 (a,  b). 
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2.9.2.  Trade-off  Curves:  Performance  of'y(a ,  b)for  a  Fixed  and  Moderate  a 

Theorem  2.7  shows  that  for  small  values  of  PFA,  7 (a,  b )  is  approximately  optimal,  i.e.,  it  is  not  possible  to 
outperform  7(0,  b)  by  a  huge  margin.  But  for  moderate  values  of  PFA,  it  is  not  clear  if  their  exists  algorithms 
which  can  significantly  outperform  7 (a,  b).  Our  aim  is  to  partially  address  this  issue  in  this  section. 

In  Figure  10.7  we  plot  the  ANO-ADD  trade-off  for  the  two-threshold  algorithm.  Specifically,  we  com¬ 
pare  the  two-threshold  algorithm  with  the  classical  Shiryaev  test  and  study  how  much  ANO  can  be  reduced 
without  significantly  loosing  in  terms  of  ADD.  For  Figure  10.7  we  pick  four  values  of  p  :  0.05,  0.01,  0.005, 0.001. 
For  a  fixed  p,  we  fix  b  =  —00  and  select  threshold  a  such  that  the  PFA(7(o,  b))  =  10-4.  We  then  increase 
the  threshold  b  to  have  ANO%  values  of  75%,  50%,  30%,  15%.  We  note  that  it  was  possible  to  reduce  the 
ANO  to  15%  of  E[r]  by  increasing  the  threshold  b  this  way,  without  affecting  the  probability  of  false  alarm. 
Figure  10.7  shows  that  we  can  reduce  ANO  by  up  to  25%  while  getting  approximately  the  same  ADD  per¬ 
formance  as  that  of  the  Shiryaev  test.  Moreover,  if  we  allow  for  a  10%  increase  in  ADD  compared  to  that 
of  the  Shiryaev  test,  then  we  can  reduce  ANO  by  up  to  70%  (see  plot  for  ANO%  =30%). 


Figure  10.7:  Trade-off  curves  comparing  performance  of  two-threshold  algorithm  with  the  Shiryaev  test  for 
ANO%  of  75,  50,  30  and  15%.  /„  ~  A7(0, 1),  /1  ~  ff(l,  1),  and  PFA  =  10“4. 

Such  a  behavior  was  also  observed  in  Table  10.4,  where  we  saw  that  the  delay  for  7 (a,  b)  is  approxi¬ 
mately  equal  to  the  Shiryaev  delay  for  moderate  to  large  ANO%  values.  Thus,  for  moderate  PFA  values, 
when  the  ANO%  is  moderate  to  large,  7 (a,  b )  is  approximately  optimal. 

2.9.3.  Comparison  with  Fractional  Sampling 

In  this  section  we  compare  the  performance  of  7 (a,  b)  with  the  naive  approach  of  fractional  sampling,  in 
which  an  ANO%  of  e  is  achieved  by  employing  Shiryaev  algorithm  and  using  a  sample  with  probability  e. 
Figure  10.8  compares  the  two  schemes  for  ANO%  of  25%.  We  also  plot  the  performance  of  the  Shiryaev 
algorithm  for  the  same  values  of  PFA  and  p.  The  figures  clearly  show  that  7 (a,  b)  helps  in  reducing  the 
observation  cost  by  a  significant  margin  as  compared  to  the  fractional  sampling  scheme. 

2.10.  Conclusions 

We  posed  a  data-efficient  version  of  the  classical  Bayesian  quickest  change  detection  problem,  where  we 
control  the  number  of  observations  taken  before  the  change  occurs.  We  obtained  a  two-threshold  Bayesian 
test  that  is  asymptotically  optimal,  has  good  trade-off  curves  and  is  easy  to  design.  We  supported  our  claim 
via  analytical  and  simulation  results.  We  derived  analytical  approximations  for  the  ADD,  PFA  and  ANO 
performance  of  the  two-threshold  policy  using  which  we  can  design  the  test  by  choosing  the  thresholds. 
Further,  there  is  a  unique  pair  of  thresholds  that  meets  a  given  set  of  constraints  of  probability  of  false  alarm 
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ANO%  =  25% 


Figure  10.8:  Trade-off  curves  comparing  performance  of  the  two-threshold  algorithm  with  the  Fractional 
Sampling  Scheme  for  ANO%  25%.  /o  ~  A7(0, 1),  f\  ~  A7(0.75, 1),  and  PFA  =  10~3. 


and  observation  cost.  This  result  has  implications  in  many  engineering  applications  where  an  abrupt  change 
has  to  be  detected  in  a  process  under  observation,  but  there  is  a  cost  associated  with  acquiring  the  data 
needed  to  make  accurate  decisions. 

In  the  absence  of  knowledge  of  the  prior  on  F,  an  important  problem  for  future  research  is  to  see  if  two- 
threshold  policies  are  optimal  in  non-Bayesian  (e.g.,  minimax)  settings.  More  importantly,  it  is  of  interest 
to  understand  how  to  update  the  test  metric  in  a  non-Bayesian  setting  when  we  skip  an  observation.  From 
an  application  point  of  view,  one  can  design  a  two-threshold  test  based  on  the  Shiryaev-Roberts  or  CUSUM 
approaches  Tartakovsky  and  Moustakides  [167],  and  use  the  undershoot  of  the  metric  when  it  goes  below 
the  threshold  ‘6’,  to  design  the  off  times.  Furthermore,  if  we  are  able  to  find  useful  lower  bounds  on  delay 
for  given  false  alarm  and  ANO  constraints,  we  may  be  able  to  use  these  to  prove  asymptotic  optimality  of 
such  heuristic  tests,  as  is  done  for  the  standard  quickest  change  detection  problem  Tartakovsky  and  Veer- 
avalli  [171],  Lai  [82].  Also,  such  lower  bounds  can  possibly  help  in  obtaining  insights  for  cases  where  the 
observations  are  not  i.i.d.  Tartakovsky  and  Veeravalli  [17 1],  Lai  [82].  Other  interesting  problems  in  this  area 
include  the  design  of  data-efficient  optimal  algorithms  for  robust  change  detection  or  nonparametric  change 
detection. 
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Chapter  11 

Spectral  and  Measurement  Approaches  in 
Information  Assurance 


This  chapter  summarizes  the  work  done  by  the  group  of  Christos  Papadopoulos  in  the  area  of  Spectral 
Analysis  Applications  in  Network  Security.  The  work  includes  three  fronts.  First,  applications  for  a  low 
rate  detection  algorithm  [14].  Second,  a  study  to  correlate  address  characteristics  of  spammers  and  non¬ 
spammers  to  determine  if  spammers  have  different  characteristics  and  arc  therefore  easier  to  detect  [183]. 
We  present  highlights  of  this  work  below.  Finally,  work  to  detect  hots  with  custom  TCP/IP  stacks,  based  on 
the  existence  of  multiple  fingerprints  [44].  Due  to  space  restrictions  we  do  not  present  this  work  here. 

1.  Using  Low-Rate  Flow  Periodicities  in  Anomaly  Detection 

As  desktops  and  servers  become  more  complicated,  they  employ  an  increasing  amount  of  automatic,  non¬ 
user  initiated  communication.  Such  communication  can  be  good  (OS  updates,  RSS  feed  readers,  and  mail 
polling),  bad  (keyloggers,  spyware,  and  botnet  command-and-control),  or  ugly  ( adware  or  unauthorized 
peer-to-peer  applications).  Communication  in  these  applications  is  often  periodic  but  infrequent,  perhaps 
every  few  minutes  to  few  hours.  This  infrequent  communication  and  the  complexity  of  today’s  systems 
makes  these  applications  difficult  for  users  to  detect  and  diagnose.  We  show  that  there  are  several  classes 
of  applications  that  show  low-rate  periodicity  and  demonstrate  that  they  arc  widely  deployed  on  public  net¬ 
works.  In  this  paper  we  present  a  new  approach  to  identify  changes  in  low-rate  periodic  network  traffic.  We 
employ  signal-processing  techniques,  using  discrete  wavelets  implemented  as  a  fully  decomposed,  iterated 
filter  bank.  This  approach  allows  us  to  cover  a  large  range  of  low-rate  periodicities,  from  seconds  to  hours, 
and  to  identify  approximate  times  when  traffic  changed.  Network  administrators  and  users  can  use  our  tech¬ 
niques  for  network-  or  self- surveillance.  To  measure  the  effectiveness  of  our  approach,  we  show  that  it  can 
detect  changes  in  periodic  behavior  caused  by  events  such  as  installation  of  keyloggers,  an  interruption  in 
OS  update  checks,  or  the  P2P  application  BitTorrent.  We  quantify  the  sensitivity  of  our  approach,  showing 
that  we  can  find  periodic  traffic  when  it  is  at  least  5-10%  of  overall  traffic. 

1.1.  Methodology 

We  use  wavelets  implemented  as  an  iterated  filter  bank  to  identify  periods  of  time  when  a  periodic  series  of 
connections  is  present.  In  this  section  we  discuss  how  we  go  from  network  events  to  identifying  a  change  in 
periodic  communication. 

Although  wavelets  provide  a  well  developed  mathematical  theory,  and  there  has  been  some  work  ap¬ 
plying  wavelets  to  network  traffic  before,  discovering  infrequent  periodic  traffic  is  particularly  demanding 
because  of  the  long-timescales  and  sparse  signals  involved.  Here  we  describe  the  four  main  parts  of  our 
approach  (roughly  following  the  outline  of  applying  signal  processing  to  networking  [178]):  extracting  a 
timeseries  of  events  from  network  traffic,  decomposing  the  timeseries  using  an  iterated  filter  bank,  visualiz¬ 
ing  the  resulting  multi-resolution  representation,  and  detecting  the  presence  of  a  periodic  signal.  Our  focus 
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on  long-timescales  influences  each  of  these  steps. 

For  a  complete  description  of  the  work  please  see  the  paper,  here  we  summarize  the  results  in  the 
following  tables. 


Table  11.1:  Variety  of  applications  that  show  periodic  behavior 

Category  Examples  Seen?  Period 


User  services 
RSS  News  Feeds 
Web  Counters 
P2P  Protocols 
Adware 

Spyware/Keyloggers 
Botnet  cmd&ctl 


WeatherEye 
NewzCrawler 
Google  Analytics 
Gnutella 
Gator,  ISTbar 
Spy  Buddy 
(non-commercial) 


yes 

30-120 

yes 

15-120 

yes 

5-30 

yes 

2-120 

yes 

15-60 

no 

N/A 

no 

N/A 

Table  11.1  outlines  seven  categories  of  applications  which  we  have  researched  and  identified  as  par¬ 
ticipating  in  periodic  communication.  We  have  found  numerous  examples  of  applications  in  five  of  these 
seven  categories  in  our  four-day  trace  from  USC.  However,  example  applications  don’t  characterize  how 
widespread  hosts  exhibiting  applications  with  periodic  behavior  are. 


Table  1 1.2:  Prevalence  of  malware  with  periodic  behavior  on  our  network. 

Blacklisted  Unique  IPs 
Group  Destinations  (users) 


active  to  anywhere 

- 

- 

128,614  [100%] 

active  to  blacklisted 

181 

(100%) 

- 

- 

Non-periodic 

120 

(66%) 

n/a 

n/a 

Periodic 

61 

(44%) 

n/a 

n/a 

User  Services 

5 

(3%) 

22 

[0%] 

Web  Counters 

15 

(8%) 

16,405 

[13%] 

Ad  Servers 

36 

(20%) 

31,277 

[24%] 

Other 

5 

(3%) 

6 

[0%] 

Table  1 1.2  shows  the  results  of  our  analysis.  We  found  traffic  to  181  of  the  blacklisted  destinations  from 
our  campus.  About  45,000  IP  addresses  at  USC  had  traffic  to  some  of  these  sites,  nearly  one-third  of  all 
active  campus  addresses.  (The  presence  of  dynamic  addresses  means  that  this  count  may  not  correspond 
exactly  to  45,000  users,  since  one  user  may  occupy  multiple  addresses,  and  vice  versa.) 

For  the  61  blacklisted  hosts  that  had  periodic  traffic,  we  manually  examined  the  site  and  classified  it  in 
one  of  four  categories  (user  services,  web  counters,  ad  servers,  and  other) 

1.2.  Summary 

In  this  work  we  have  shown  that  low-rate  periodicity  is  common  to  several  broad  classes,  both  good  (OS 
updates),  bad  (keyloggers  and  malware),  and  ugly  (adware),  and  that  these  applications  are  widely  deployed 
on  public  networks.  We  have  explored  a  wavelet-based  approach  to  identify  such  periodic  behavior,  and 
begun  to  explore  the  sensitivity  and  robustness  of  this  approach.  A  promising  application  of  such  analysis 
is  self-surveillance,  as  a  user  watches  his  or  her  own  traffic  to  detect  unexpected  changes. 

1.3.  Applications  of  Low  Rate  Detection 

In  the  previous  section  we  investigated  the  underlying  fundamentals  of  detection  of  low-rate  periodic  behav¬ 
ior  and  evidence  that  periodic  behavior  occurs.  In  this  section  we  look  at  two  applications,  self-surveillance 
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and  network  surveillance,  and  then  demonstrate  that  a  variety  of  applications  show  low-rate  periodicity  and 
those  applications  occur  in  real  networks. 

1.3.1.  Self-surveillance:  Identifying  Changes  in  Periodic  Behavior  of  a  Host 

Earlier  we  demonstrated  that  malware  shows  periodic  behavior  that  can  be  identified,  even  in  the  face  of 
noise.  Now  we  show  how  to  identify  dynamic  changes  in  periodic  behavior,  and  the  time  when  these 
changes  occur. 

We  wish  to  identify  changes  in  the  periodic  behavior  of  a  given  host  to  help  users  better  understand 
activities  on  their  computer.  All  operating  systems  and  an  increasing  number  of  applications  automatically 
poll  for  updates  periodically.  In  addition,  spyware  and  adware  often  report  back  to  or  request  new  informa¬ 
tion  from  the  external  master.  In  fact,  application  updates  sometimes  do  not  reveal  the  presence  of  automatic 
polling,  or  how  much  information  they  disclose.  Moreover,  malware  may  terminate  automatic  updates  after 
infecting  a  host.  Thus,  users  will  want  to  know  when  automatic  checks  stop,  or  the  addition  of  an  automatic 
reporting  service  to  their  machines. 

We  consider  two  examples  where  the  change  in  periodic  behavior  is  of  interest,  namely  a  change  in  OS 
update  checks  and  a  change  in  communication  patterns  created  by  the  installation  of  a  keylogger. 

Detecting  operating  system  updates.  Security  policies  of  all  operating  systems  and  many  applications 
include  automatic  polling  for  updates  with  typical  periods  ranging  from  30  minutes  to  a  week.  Just  as 
network  administrators  wish  to  detect  the  presence  of  bad  behavior,  the  absence  of  good  behavior  may  also 
be  of  great  interest.  In  addition,  since  automatic  updates  arc  often  disclosed  only  in  the  fine  print  of  an  end- 
user  license  agreement,  users  may  also  wish  to  know  when  a  newly  installed  application  performs  regular 
update  checks. 

To  confirm  we  can  see  a  change  in  update  checks  we  monitored  a  lab  machine  running  the  Fedora  10  dis¬ 
tribution  of  Linux  for  three  days.  By  default,  Fedora  polls  update  servers  every  hour  using  yum-updatesd. 
During  the  second  day  of  the  experiment,  we  disabled  update  checks  at  2pm.  The  machine  was  lightly  used 
for  web  browsing  and  e-mail  over  the  three  day  period. 

Our  system  correctly  identifies  periodic  behavior  near  3600s  in  this  test  traffic.  Our  algorithm  to  place 
events  in  time  finds  a  change  in  this  periodic  behavior  between  noon  and  9pm,  consistent  with  our  known 
time  of  2pm.  We  have  not  tuned  our  algorithm  for  temporal  placement;  a  more  sophisticated  approach  would 
most  likely  narrow  this  window,  although  precision  is  ultimately  limited  by  the  1-hour  period. 

To  understand  how  our  system  can  automatically  identify  absence  of  OS  checks.  Figure  11.1  shows 
traffic  periodicity  with  and  without  OS  update  checks.  At  the  16th  level  of  decomposition  of  Figure  1 1. 1(b) 
we  see  OS  update  polling  appeal-  as  energy  at  the  base  period  of  one  hour  (3600s,  two  adjacent  3%  bins),  and 
as  harmonics  at  half,  three  fourths  and  one  and  a  half  times  the  frequency  (6553s,  4%  energy,  4800s,  4%  and 
2400s,  3%  energy).  Disabling  updates,  by  contrast,  shows  no  energy  below  the  14th  level  of  decomposition 
(Figure  11.1(c)). 

Our  algorithms  detect  periodicity  automatically  with  an  adaptive  threshold.  Figure  11.1(a)  shows  the 
numeric  comparison  corresponding  to  the  whole  72  hour  observation.  As  we  can  see,  each  of  the  periodici¬ 
ties  that  are  visible  in  Figure  11.1(b)  are  above  the  detection  threshold  in  Figure  11.1(a).  More  importantly, 
because  detection  is  numeric,  it  can  be  automated  and  more  sensitive  and  consistent  than  human  interpreta¬ 
tion. 

This  example  demonstrates  that  our  method  successfully  identifies  a  periodic  behavior,  and  can  also 
identify  when  that  behavior  starts  and  stops.  While  in  some  cases  system  administrators  may  be  able  to 
directly  monitor  OS  update  polling  if  they  have  administrative  access  to  the  machines  in  question,  we  suggest 
our  approach  could  be  useful  when  only  network  access  is  possible  or  desirable,  for  example,  due  to  privacy 
reasons.  In  addition,  monitoring  periodic  checks  is  robust  to  a  potentially  changing  set  of  servers  hosting 
OS  updates. 
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(a)  Automatic  detection  of  OS  updates:  energy  vs.  detection 
threshold,  all  72  hours. 
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(b)  Traffic  with  automatic  polling  for  OS  updates.  (c)  Traffic  without  automatic  update  polling. 

Figure  11.1:  Visualization  illustrating  periodic  behavior  before  and  after  removal  of  OS  update  checks. 


1.3.2.  Detecting  a  Key  logging  Application 

OS  updates  arc  an  example  of  desirable  periodic  behavior.  We  next  look  at  an  example  of  a  periodic  behavior 
which  is  undesirable,  namely  a  keylogger.  Many  keyloggers  report  on  user  activity  at  specified  intervals,  to 
inform  their  masters  what  they  have  learned  and  that  they  are  still  operational — we  confirmed 
supervisor-configured  reporting  intervals  in  both  SpyBuddy  and  Keyboard  Guardian. 

Experiment:  install  laptop  w.  Keyboard  Guardian.  To  investigate  if  we  can  detect  keylogger  reporting 
we  installed  Keyboard  Guardian  on  a  dedicated  Windows  computer.  We  monitored  all  TCP  flows  from  the 
test  machine  for  a  three  day  period  while  using  the  test  machine  for  occasional  e-mail  and  web  browsing. 

On  the  second  day  of  the  experiment,  we  installed  Keyboard  Guardian  at  4pm,  and  configured  Keyboard 
Guardian  to  email  reports  every  three  hours.  Our  computer  use  compared  to  keylogger  reporting  resulted  to 
an  SNR  of  0.1.  Figures  showing  these  results  arc  omitted  here  due  to  space,  but  arc  available  in  our  technical 
report  [14]. 

We  ran  our  system  on  trace  files  collected  from  the  test  machine,  which  correctly  identified  not  only  the 
periodic  behavior  but  also  the  frequency  of  the  reporting  period  (10,800s,  92/ddz).  Additionally,  the  system 
identified  the  presence  of  the  signal  between  12pm  and  9pm  on  the  second  day  of  our  experiment,  correctly 
bracketing  the  4pm  installation  time. 

This  experiment  shows  we  can  detect  low  rate  but  regular  traffic  as  well  as  changes  in  periodic  commu¬ 
nication  associated  with  a  known  spyware  tool.  We  anticipate  that  this  approach  could  be  used  by  a  network 
administrator  to  monitor  a  large  number  of  user  machines,  searching  for  malicious  activity.  Although  cen¬ 
tralized  companies  could  do  such  monitoring  more  easily  by  modifying  software  individual  machines,  some 
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companies  (for  example,  Google)  and  most  ISPs  do  not  have  this  ability.  While  such  network  monitoring 
is  possible  today  with  centrally  maintained  blacklists,  our  approach  detects  behavioral  changes  that  would 
apply  to  malware  before  the  control  site  is  blacklisted.  After  detection,  network  administrators  could  take 
action  to  further  investigate,  perhaps  notifying  the  machine’s  owner  or  subjecting  that  host  to  more  invasive 
monitoring  or  quarantine. 

2.  Correlating  Spam  Activity  with  IP  Address  Characteristics 

It  is  well  known  that  spam  hots  mostly  utilize  compromised  machines  with  certain  address  characteristics, 
such  as  dynamically  allocated  addresses,  machines  in  specific  geographic  areas  and  IP  ranges  from  AS’ 
with  more  tolerant  spam  policies.  Such  machines  tend  to  be  less  diligently  administered  and  may  exhibit 
less  stability,  more  volatility,  and  shorter  uptimes.  However,  few  studies  have  attempted  to  quantify  how 
such  spam  hot  address  characteristics  compare  with  non-spamming  hosts.  Quantifying  these  characteristics 
may  help  provide  important  information  for  comprehensive  spam  mitigation. 

In  this  work,  we  use  two  large  datasets,  namely  a  commercial  blacklist  and  an  Internet-wide  address 
visibility  study  to  quantify  address  characteristics  of  spam  and  non-spam  networks.  We  find  that  spam 
networks  exhibit  significantly  less  availability  and  uptime,  and  higher  volatility  than  non-spam  networks.  In 
addition,  we  conduct  a  collateral  damage  study  of  a  common  practice  where  an  ISP  blocks  the  entire  /24 
prefix  if  spammers  arc  detected  in  that  range.  We  find  that  such  a  policy  blacklists  a  significant  portion  of 
legitimate  mail  servers  belonging  to  the  same  prefix. 

For  brevity,  we  present  only  the  results  of  the  last  paid  of  our  work  in  this  report.  Our  full  results  can  be 
found  in  the  paper  that  appealed  in  Global  Internet  2010. 

2.1.  Collateral  Damage 

Our  prior  work  has  shown  that  both  address  and  hostname  characteristics  confirm  that  spam  originates 
from  dynamic  addresses.  We  use  these  results  to  consider  a  new  question:  Is  blacklisting  an  entire  /24 
prefix  based  on  the  presence  of  one  or  more  spamming  hosts  an  effective  policy?  While  many  blacklists 
enumerate  individual  IP  addresses,  blocking  entire  /24  prefixes  are  also  common.  We  arc  concerned  about 
reducing  spamming,  but  also  about  the  blocking  of  legitimate  outgoing  email.  We  define  collateral  damage 
as  the  number  of  legitimate  mail  servers  which  would  be  incorrectly  filtered  when  an  entire  /24  prefix  is 
blacklisted.  First  we  identify  all  survey  prefixes  which  have  spamming  hosts.  If  these  prefixes  also  contain 
non-spamming  hosts,  then  they  arc  subject  to  collateral  damage.  Figure  11.2  compares  the  number  of 
spammers  versus  non-spammers  in  the  set  of  intersected  prefixes. 

Except  for  outliers,  the  graph  shows  that  many  of  the  prefixes  seem  to  cluster  along  the  left-axis  (grey) 
or  the  top  diagonal  (black).  The  diagonal  is  present  because  the  sum  of  spammers  and  non-spammers  is 
never  more  than  the  size  of  a  prefix  (255).  The  left-axis  cluster  shows  prefixes  with  a  reasonably  uniform 
number  of  non-spammers  and  a  small  number  of  spammers.  The  diagonal  cluster  shows  a  large  number  of 
spammers  residing  in  highly  populated  prefixes.  These  clusters  may  reflect  two  different  situations.  The 
diagonal  cluster  shows  heavily  compromised  prefixes,  which  we  believe  may  have  negligent  administration 
or  a  collaborating  provider.  The  other  cluster  represents  a  limited  number  of  compromised  hosts  in  an 
otherwise  normal  prefix,  we  believe  these  may  be  caused  by  hots.  The  latter  are  prone  to  collateral  damage, 
since  they  contain  a  high  number  of  non-spamming  hosts  and  a  low  number  of  spamming  hosts. 

Anti-spammers  typically  assume  there  is  no  collateral  damage  in  blacklisting  a  /24  prefix,  because  many 
ISPs  forward  legitimate  mail  through  the  ISPs  mail  server,  rather  than  allowing  hosts  to  send  mail  directly. 
We  are  only  able  to  quantify  whether  a  blacklisted  prefix  contains  mail  sources,  by  studying  their  hostnames 
and  DNS  mail  forwarding  records.  For  this  study  we  extract  the  reverse  hostname  and  MX  record  for  each 
address  in  the  prefix,  using  the  Linux  host  and  dig  commands.  Finally  we  intersect  the  mail  server  IP 
addresses  against  the  survey  dataset  to  see  if  any  reside  in  the  blacklisted  prefixes.  Table  11.3  shows  the 
progression  of  our  data  analysis. 

We  start  with  646,040  addresses  that  reside  in  the  4,126  spamming  prefixes  in  our  intersection  set.  We 
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Figure  11.2:  Number  of  Spammers  versus  Non-Spammers 


Table  11.3:  Collateral  Damage  Study 


Description 

Domains 

Hosts 

PREFIXES 

Intersected  prefixes 

646,040 

4,126 

Domain  Query  Timeout 

12,899 

Domain  Query  Invalid 

175,535 

Domain  Query  Valid 

457,606 

Unique  Domain  Names 

4,044 

Number  Mail  Servers 

6,718 

Unique  Mail  Servers 

3,872 

2,154 

Collateral  Damage 

1,377 

365 

subtract  addresses  that  timeout  or  fail  to  return  a  valid  domain  name.  From  the  remainder,  our  programs 
identify  a  set  of  unique  domain  names,  and  the  addresses  of  the  corresponding  mail  servers.  Intersecting 
these  addresses  with  our  spamming  prefixes,  we  find  collateral  damage  of  1,377  mail  servers  and  365  pre¬ 
fixes,  which  is  ~8.8%  of  all  spamming  prefixes.  We  conclude  that  prefix  blocking  incurs  a  high  rate  of 
collateral  damage,  suggesting  the  need  for  finer-grain  filtering. 
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Chapter  12 

Application  of  Quickest  Changepoint 
Detection  to  Information  Assurance  and 
Cybersecurity 


This  chapter  is  intended  to  assess  the  progress  made  in  applied  changepoint  detection.  Specifically,  we 
considered  two  major  areas  of  application:  information  assurance  and  cybersecurity.  The  work  in  both  was 
a  joint  effort  of  Dr.  Tartakovsky  (University  of  Southern  California,  Department  of  Mathematics),  Dr.  Pa- 
padopoulos  (Colorado  State  University,  Department  of  Computer  Science)  and  Dr.  Heidemann  (University 
of  Southern  California,  Department  of  Computer  Science  and  Information  Sciences  Institute). 

1.  Introduction 

One  of  the  important  applications  that  stimulated  the  research  in  this  project  related  to  development  of  ef¬ 
ficient  distributed  changepoint  detection  methods  is  intrusion  detection  in  distributed  high-speed  computer 
networks.  A  significant  number  of  serious  cyberattacks  on  a  variety  of  governmental  agencies,  universi¬ 
ties,  and  corporations  have  been  identified  [78].  These  attacks,  including  a  variety  of  buffer  overflows, 
worm-based,  denial-of-service  (DoS)  and  man-in-the-middle  (MiM)  attacks,  are  designed  to  gain  access  to 
additional  hosts,  steal  sensitive  data,  and  disrupt  network  services.  As  a  result,  rapid  detection  of  a  wide 
spectrum  of  network  intrusions  and  robust  separation  of  legitimate  and  malicious  traffic  are  vital  for  the 
continuation  of  normal  operation  of  military,  federal,  industrial,  and  enterprise  networks. 

There  is  a  wide  variety  of  intrusion  detection  methods  proposed  in  the  literature  [43].  There  are  two 
broad  IDS  categories:  (a)  signature  based  and  (b)  anomaly  based  [43,  78].  The  main  problem  with  signature 
IDSs  is  that  the  signatures  must  be  defined  a  priori ;  thus,  this  technique  is  ineffective  against  new  attacks. 
In  anomaly-based  detection,  the  IDS  is  first  trained  to  recognize  “normal”  traffic  patterns  and  then  classifies 
deviations  as  attacks.  The  problems  with  this  approach  arc  a  high  rate  of  false  positives,  the  cost  of  training 
and  re-training,  and  susceptibility  to  carefully  crafted  attacks  that  “train”  themselves  into  normal  traffic. 

Typically  network  intrusions  occur  at  unknown  points  in  time  and  lead  to  changes  in  the  statistical  prop¬ 
erties  of  certain  observables.  For  example,  distributed  DoS  (DDoS)  attacks  lead  to  changes  in  the  mean 
value  of  the  number  of  packets  of  a  particular  type  (TCP,  ICMP,  or  UDP)  and  size.  It  is  therefore  intuitively 
appealing  to  formulate  the  problem  of  detecting  attacks  as  a  quickest  changepoint  detection  problem:  to  de¬ 
tect  changes  in  statistical  models  as  rapidly  as  possible  (i.e.,  with  minimal  average  delays)  while  maintaining 
the  false  alarm  rate  at  a  given  low  level. 

In  this  project,  we  developed  not  only  an  efficient  anomaly  IDS  based  on  changepoint  methods,  but  also 
a  hybrid  anomaly-signature  IDS  which  allows  for  filtering  of  false  positives  and  confirmation  of  real  attacks. 
Thus,  this  novel  hybrid  approach  allows  us  to  overcome  common  drawbacks  of  both  anomaly  and  signature 
methods  when  applied  separately. 
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2.  The  Hybrid  Anomaly-Signature  Intrusion  Detection  System 

2.1.  The  Idea  and  Structure  of  the  System 

In  this  project,  we  proposed  a  novel  hybrid  approach  to  network  intrusion  detection  which  is  particularly 
efficient  for  capturing  DDOS  attacks.  The  hybrid  anomaly-signature  Intrusion  Detection  System  (IDS)  im¬ 
plements  the  change  detection  algorithm  (anomaly  IDS)  and  the  spectral-based  signature  IDS  in  parallel.  In 
other  words,  the  methodology  is  based  on  using  the  changepoint  detection  method  for  preliminary  detection 
of  attacks,  and  discrete  Fourier  transform  to  reveal  periodic  patterns  in  network  traffic  which  can  be  used  to 
confirm  the  presence  of  attacks  and  reject  false  detections  triggered  by  the  anomaly  IDS.  It  is  worth  men¬ 
tioning  that  in  network  security  applications  it  is  of  utmost  importance  to  detect  very  rapidly  attacks  that 
may  occur  in  a  distant  future  (using  a  repeated  application  of  the  same  anomaly-based  detection  algorithm), 
in  which  case  the  true  detection  of  a  real  change  may  be  preceded  by  a  long  interval  with  frequent  false 
alarms  that  should  be  filtered  (rejected)  by  a  separate  algorithm,  which  may  be  built  based  on  signatures 
(e.g.,  spectral  signatures).  At  the  second  stage,  we  propose  to  exploit  a  spectral-based  IDS. 

The  architecture  of  an  automated  two-stage  (cascade)  hybrid  anomaly-signature  IDS  is  shown  schemati¬ 
cally  in  Figure  12.1.  The  IDS  utilizes  changepoint  detection  for  preliminary  intrusion  detection,  and  discrete 
Fourier  transform  to  confirm  true  and  reject  false  intrusions.  Such  a  hybrid  approach  simultaneously  speeds 
up  detection  and  lowers  the  frequency  of  false  alarms. 


RAW  TRAFFIC 


RAW  TRAFFIC 


Figure  12.1:  The  hybrid  intrusion  detection  system. 

To  illustrate  how  the  hybrid  IDS  works,  consider  detecting  a  DoS  attack.  A  DoS  attack  is  a  malicious 
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attempt  to  disrupt  (ideally  -  completely  knock  off)  an  online  service.  This  can  be  achieved,  e.g.,  by  sending 
a  large  number  of  packets  to  the  target  (victim)  to  congest  its  link.  Consequently,  once  the  victim’s  link  is 
overloaded,  it  starts  to  clock  attack  packets  in  regular  intervals.  For  example,  trying  to  push  through  more 
than  10Mbps  of  traffic  out  of  a  10Mbps  link  will  clock  out  packets  at  approximately  800  packets  per  second, 
if  the  packets  are  1500  bytes  in  size  each.  This  periodicity,  though  mixed  with  other  non-attack  traffic 
towards  the  target’s  network,  will  result  in  an  easily  detectable  spike  in  the  spectrum  at  the  corresponding 
frequency.  At  the  same  time,  since  DoS  attacks  lead  to  changes  in  the  statistical  properties  of  traffic  date, 
changepoint  detection  can  be  effectively  used  to  detect  these  changes.  The  idea  of  the  hybrid  IDS  is  to 
use  changepoint  detection  as  an  “early  warning”  system,  and  once  it  sounds  a  alarm,  turn  on  the  spectral 
analyzer  for  a  more  thorough  traffic  analysis. 


Frequency,  kHz 


- 

L. 

0  2  4  6  8  10  12  14  16  18  20 

Frequency,  kHz 


(a)  No  attack  (pre-change)  (b)  Under  attack  (post-change) 

Figure  12.2:  Power  spectral  density. 

Figure  12.2  gives  an  example  of  implementation  of  FFT  (fast  Fourier  transform)  for  a  real  data  set 
containing  an  attack.  Note  that  in  the  no-attack  mode  there  are  no  periodic  patterns  in  the  traffic  distribution, 
and  hence,  there  is  no  peak  in  the  spectrum,  while  under  the  attack  there  is  a  contrast  peak  suggesting  that 
this  might  indeed  be  an  attack.  Namely  this  phenomenon  is  used  to  filter  false  positives  and  confirm  true 
attacks.  Separately  anomaly-  and  signature-based  IDS’s  have  pros  and  cons.  Combining  them  in  one  unit 
allows  us  to  obtain  the  best  possible  performance. 

In  summary,  the  hybrid  anomaly-signature  IDS  is  based  on  the  following  principles  and  has  the  follow¬ 
ing  features: 

•  Anomaly  IDS  -  Quick  Detection  with  High  FAR:  In  order  to  detect  attacks  quickly,  detection 
threshold  in  the  changepoint  detection  module  are  lowered,  which  leads  to  frequent  false  alarms  that 
should  be  filtered  by  a  separate  algorithm. 

•  Signature  IDS  -  False  Alarm  Filtering:  To  reject  false  detections  a  signature -based  approach  is  used 
based  on  a  spectral  analysis  module. 

•  Changepoint  Detection  Module  for:  (a)  Quick  detection  with  relatively  high  FAR,  and  (b)  Trigger¬ 
ing  of  spectral  analysis  algorithms. 

•  Spectral  Analysis  Module  for:  (a)  False  alarm  filtering/rejection;  and  (b)  True  attack  confirmation. 

Figure  12.3  illustrates  the  hybrid  anomaly-spectral  IDS  in  action.  The  first  plot  shows  raw  data  (packet 
rate).  The  second  plot  shows  the  behavior  of  the  CUSUM  statistic,  which  is  being  restarted  from  scratch 
(repeated)  when  a  threshold  exceedance  occurs.  The  third  plot  shows  PSD  (power  spectral  density)  at  the 
output  of  the  spectral  analyzer:  the  peak  appears  only  when  the  attack  starts  (which  confirms  the  attack), 
while  previous  threshold  exceedances  (false  alarms)  arc  rejected  by  the  spectral  analyzer. 
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Figure  12.3:  Hybrid  IDS  in  action. 


2.2.  Detection  of  DoS  Attacks 

In  this  subsection,  we  illustrate  efficiency  of  the  hybrid  IDS  for  detecting  UDP  Packet  Storm  Attacks,  one 
of  the  common  DoS  attacks.  Specifically,  we  report  and  discuss  the  results  obtained  from  the  empirical 
performance  analysis  of  the  hybrid  IDS  using  a  real-life  distributed  DoS  attack,  namely,  a  packet  storm  attack 
on  User  Datagram  Protocol  (UDP)  port  22.  This  trace  was  captured  off  one  of  the  ServePath1  networks.  The 
attack  starts  about  60  seconds  into  the  trace,  and  consists  of  very  short  packets  (about  15  bytes  in  size  each) 
sent  to  the  victim’s  UDP  port  22.  The  rate  is  about  180Kpps  with  that  of  the  background  traffic  being  about 
53Kpps.  Although  intensity-wise  it  is  a  rather  contrast  attack,  it  is  quite  short  in  duration  -  only  about  10 
seconds  long.  This  poses  a  challenge  for  the  hybrid  system. 

Figure  12.4  shows  the  cumulative  packet  rate.  It  is  seen  that  there  is  a  considerable  jump  in  the  packet 
rate  at  the  time  moment  the  attack  begins. 


Figure  12.4:  Storm  attack  on  UDP  Port  22  (cumulative  packet  rate). 

1  See  www .  servepath .  com  for  more  information. 
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Figure  12.5  shows  the  corresponding  FFT  output.  The  contrast  peak  in  the  middle  of  the  plot  suggests 
that  this  might  indeed  be  an  attack.  It  is  this  phenomenon  that  we  use  to  filter  false  positives  in  the  hybrid 
system. 

We  now  turn  into  discussion  of  how  the  proposed  system  can  be  used  to  also  isolate  shorts  attacks. 
Figure  12.6  illustrates  the  difference  between  not  using  the  spectral  analyzer  and  using  it  to  confirm  the 
attack.  From  Figure  12.6(a)  we  see  that  the  first  attack  is  detected  while  the  second  one  is  not.  This  is 
because  we  did  not  rely  on  the  spectral  analyzer  and  had  to  use  high  detection  threshold.  Consequently,  we 
had  almost  no  false  alarms  but  we  also  failed  to  detect  the  second  attack.  At  the  same  time  Figure  12.6(b) 
shows  the  case  where  we  lowered  the  detection  threshold,  which  increased  the  level  of  false  alarms,  but  the 
false  alarms  were  successfully  filtered  by  the  spectral  analyzer.  As  it  can  be  seen,  in  this  case  we  detected 
and  isolated  both  attacks. 


Figure  12.5:  Spectral  density  for  the  UDP  attack. 


(a)  No  detection 


(b)  Successful  detection 


Figure  12.6:  UDP  attack  double  peak  detection. 
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3.  Application  to  Spam  Detection:  Fighting  Spam  at  the  Network  Level 

Spam  is  a  well-known  problem,  and  the  need  for  an  efficient  way  to  shield  against  spam  is  evident.  As 
an  anti-spam  solution,  most  organizations  run  some  version  of  spam  filters  at  their  local  networks  (e.g., 
Bayesian  filters  or  some  sort  of  block  lists).  These  techniques  work  quite  well,  but  they  arc  typically  expen¬ 
sive  both  in  initial  and  operational  costs.  Block  lists  rely  on  information  that  was  gathered  ahead  of  time,  and 
thus  might  be  stale.  Bayesian  approaches,  while  generally  good,  arc  not  infallible  and  require  examination 
of  all  message  content. 

The  idea  of  our  approach  is  to  monitor  traffic  at  the  network  level.  This  has  several  advantages:  it 
requires  no  message  content  examination  and  thus  guards  privacy;  spammers  can  be  detected  almost  in¬ 
stantly  based  on  their  network  behavior;  collateral  damage  is  reduced  because  dynamic  addresses  released 
by  spammers  can  be  removed  from  block  lists  quickly;  and  IP  addresses  can  be  blocked  before  connections 
arc  accepted,  saving  resources  at  the  mail  server. 

What  features  arc  useful  for  detecting  spammers?  Prior  work  has  shown  that  the  Autonomous  System 
the  IP  address  belongs  to,  the  message  size,  the  number  of  blocked  connections  and  the  message  length 
arc  important  features.  All  of  them  can  be  determined  from  network  traffic.  We  investigated  such  features 
and  used  changepoint  detection  to  detect  when  traffic  patterns  from  a  particular  host  match  known  spammer 
patterns.  Examples  include  the  following: 

1 .  Message  size.  Most  spam  campaigns  attempt  to  deliver  either  identically  the  same  or  similar  (content- 
wise)  message  to  many  recipients.  As  a  result,  with  the  exception  of  the  receiver’s  IP  address,  the  size 
of  the  message  does  not  vary  significantly.  Thus,  detecting  hosts  that  send  email  blocks  of  a  similar 
size  is  one  feature  to  look  for.  False  positives  (e.g.,  from  mailing  lists)  can  be  eliminated  by  looking 
at  other  features  such  as  past  history  and  the  presence  of  MX  records  associated  with  that  IP  address. 

2.  Dropped  connections.  If  a  mail  server  uses  a  block  list  to  refuse  connections  from  suspected  spam¬ 
mers,  these  will  be  detected  in  network  traces.  Keeping  track  of  such  events  can  help  detect  spammers 
for  everyone.  Changepoint  detection  can  be  used  to  detect  a  change  in  the  number  of  dropped  con¬ 
nections  from  a  particular  IP  address. 

3.  Connection  patterns.  So  as  to  lower  the  probability  of  being  detected,  spammers  typically  send  very 
few  emails  to  a  particular  domain.  With  network  monitoring,  however,  which  can  monitor  many 
domains  at  once,  this  particular  pattern  can  be  detected.  Changepoint  detection  can  be  used  to  detect 
a  spammer  touching  many  different  domains. 

We  analyzed  a  number  of  network  traces  containing  spam-related  activity  in  order  to  better  understand 
the  signature  message  size  pattern  (data  mining)  to  subsequently  use  in  a  changepoint  based  detector  to 
defeat  spam.  Figure  12.7  summarizes  our  findings. 

Specifically,  Figure  12.7  shows  the  evolution  of  the  email  size  in  time  (for  real-world  data).  The  main 
observation  is  that  the  email  size,  though  has  occasional  spikes,  most  of  the  time  is  flat.  A  host  having  such 
a  pattern  is  a  potential  spammer.  We  equipped  a  changepoint  based  detector  with  the  ability  to  check  for 
such  patterns  to  confirm  that  a  certain  host  is  a  spammer.  We  tested  the  whole  system  using  an  example  with 
a  real-world  spammer.  The  whole  process  is  shown  in  Figure  12.8.  Specifically,  under  surveillance  is  the 
email  (SMTP)  traffic  generated  by  a  certain  host.  Ordinarily,  the  SMTP  traffic  produced  by  a  user  sending 
legitimate  messages  is  characterized  by  a  relatively  steady  intensity,  i.e.,  the  number  of  messages  sent  per 
unit  time  remains  more  or  less  constant,  with  no  major  bursts  or  drops.  However,  the  behavior  changes 
completely  once  a  spam  attack  is  initiated:  the  number  of  messages  sent  off  explodes,  possibly  for  a  very 
short  period  of  time.  The  topmost  plot  in  Figure  12.8  illustrates  just  this.  The  spike  in  the  traffic  intensity 
that  appeal's  in  the  far  right  of  the  plot  can  be  detected  by  methods  of  statistical  changepoint  detection.  We 
considered  two  most  popular  detection  procedures  -  the  CUSUM  and  Shiryaev-Roberts  procedures.  The 
middle  and  bottom  plots  in  Figure  12.8  show  the  detection  process  for  each  of  the  two  procedures.  Either 
procedure  momentarily  rises  an  alarm  as  soon  as  the  traffic  intensity  blunder  caused  by  the  spam  attack  is 
encountered. 
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Figure  12.7:  The  message  size  pattern  of  a  typical  spammer. 


U 


n,  sec  (sample) 


Figure  12.8:  Detection  of  a  spammer. 


At  the  point  the  alarm  is  sounded,  the  system  checks  the  message  size  pattern,  and  based  on  the  fact  that 
the  message  size  is  fairly  stable  (standard  deviation  is  very  small)  concludes  that  this  is  a  legitimate  alarm. 

4.  Application  to  Repulse  Unauthorized  Break-Ins 

Yet  another  major  type  of  computer  security  risk  is  when  a  system  (whether  an  individual  computer,  or  an 
entire  network)  is  broken  into  by  an  unauthorized  party.  An  event  of  this  kind,  i.e.,  an  unauthorized  break-in 
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to  a  system,  is  associated  with  gaining  access  to  a  machine  without  permission  to  subsequently  temper  with 
it.  Some  of  the  short-term  consequences  of  a  successful  unauthorized  break-in  include: 

1 .  Stealing  sensitive  data; 

2.  Turning  the  machine  into  a  relay  to  send  junk  email  (spam); 

3.  Bringing  down  the  entire  network  just  for  the  fun  of  it. 

Long-term,  the  consequences  may  snowball  to  a  matter  of  national  security.  The  need  to  device  a 
defensive  mechanism  is  thus  apparent.  We  proposed  an  efficient  solution  against  such  threats. 

To  illustrate  our  approach,  consider  a  hacker  seeking  to  break  into  a  machine.  This  process  may  be 
described  as  consisting  of  two  phases.  During  Phase  1,  the  hacker  launches  a  dictionary  attack  (see  below) 
attempting  to  guess  a  valid  username -password  combination  (ideally,  the  root  one),  to  subsequently  use 
it  to  get  through  to  the  machine’s  shell;  in  real  life,  the  hacker  may  obtain  a  valid  username -password 
combination  via  other  means,  such  as  through  phishing,  social  engineering,  etc.  Next,  assuming  the  hacker 
was  successful  in  gaining  access  to  the  machine’s  shell,  the  break-in  process  rolls  on  to  Phase  2.  During  the 
latter,  successfully  logged-in,  the  hacker  proceeds  to  “playing”  with  the  machine.  The  specific  activities  very 
from  case  to  case,  but  common  to  most  is  downloading  malware  onto  the  machine  and  opening  a  backdoor 
(e.g.,  for  a  possible  accessory  as  well  as  for  later  returns).  Thus,  an  unauthorized  break-in  is  a  two-phase 
process,  where  each  phase  is  characterized  by  its  own  unique  features  in  terms  of  the  traffic  patterns  it 
generates.  Therefore,  an  individual  approach  to  detection  of  each  is  required.  Both  can  be  designed  using 
a  changepoint  detection  based  anomaly  IDS,  or  using  the  hybrid  anomaly-signature  IDS.  However,  the 
problem  with  simply  employing  two  independent  detectors  is  high  level  of  false  positives:  it  is  the  level  of 
the  first  detector  plus  that  of  the  second  one.  To  overcome  this,  observe  that  the  two  phases  are  co-related: 
the  traffic  pattern  generated  by  Phase  1  activities  is  followed  by  the  pattern  generated  by  Phase  2  activities. 
Factoring  this  correlation  in  helps  reducing  the  overall  frequency  of  false  positives. 

We  performed  a  simulation  of  an  unauthorized  break-in  to  evaluate  the  performance  of  our  approach. 
The  test-bed  was  half  real  traffic,  and  half  simulated.  Specifically,  we  modeled  Phase  1  using  real-life  traces. 
However,  since  a  trace  of  a  real  attack  where  a  hacker  has  actually  broken  into  a  machine  is  hard  to  obtain, 
we  simulated  Phase  2.  We  now  describe  each  phase  separately,  beginning  with  the  first  one. 

4.1.  Phase  1:  Dictionary  Attack 

Recall  that  with  a  dictionary  attack  the  hacker  attempts  to  guess  a  correct  username -password  combination 
to  break  into  a  server,  typically  through  SSH.  While  we  illustrate  the  attack  with  SSH,  the  attack  applies 
to  any  username -password  access  control  method,  including  web  authentication  and  other  similar  meth¬ 
ods.  To  achieve  this  goal  the  attacker  initiates  what  is  essentially  a  brute  force  attack:  a  rapid  sequence  of 
SSH  authorization  requests,  where  each  request  contains  a  username -password  combination  either  guessed 
based  on  prior  partial  information  about  the  real  username -password,  or  trying  out  common  user  names  and 
passwords.  The  word  “dictionary”  in  this  context  is  used  figuratively  to  illustrate  that  the  attacker  has  a  list 
(dictionary)  of  “suspected”  username-password  combinations.  In  a  dictionary  attack  the  hacker  successively 
tries  all  of  them.  Figure  12.9  illustrates  this  kind  of  an  attack  schematically. 

Consider  an  example  of  a  real-world  SSH  dictionary  attack.  This  dataset  was  provided  by  a  regional  ISP. 
Figure  12. 10(a)  shows  the  intensity  of  the  number  of  packets  passing  through  the  victim  server’s  link.  Notice 
that  the  server  remains  idle  most  of  the  time,  occasionally  exhibiting  interaction  with  other  computers  in  the 
network.  Eventually,  the  server  starts  to  receive  suspiciously  many  SSH  requests.  Figure  12. 10(b)  shows  the 
behavior  of  the  CUSUM  detection  statistic.  The  behavior  of  the  latter  is  very  similar-  to  that  of  the  packet 
rate.  As  soon  as  the  server  is  under  attack,  the  statistic  jumps  through  the  detection  threshold  (red  flat  line), 
and  an  alarm  is  raised  resulting  in  successful  detection  of  the  attack. 

The  instantaneous  and  successful  detection  described  above  is  not  a  miracle  because  the  attack  is  very 
contrast.  Therefore,  it  is  not  a  good  illustration  of  the  potential  of  changepoint  detection.  To  make  things 
more  challenging,  we  intensionally  diminished  the  intensity  of  the  attack.  The  new  dataset  is  excellent  to 
demonstrate  not  only  the  potential  of  changepoint  detection,  but  also  that  of  the  hybrid  system.  The  spectral 
approach  is  expected  to  work  because,  as  mentioned  earlier,  dictionary  attacks  introduce  periodicity  in  the 
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Figure  12.9:  Generic  dictionary  attack  scenario. 
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(b)  CUSUM  detection  statistic. 
Figure  12.10:  SSF1  dictionary  attack  traffic  pattern  and  its  detection. 


100 


traffic  flow.  Figure  12.11  illustrates  this  for  this  attack.  Specifically,  the  figure  is  a  magnified  version  of  the 
patterns  in  the  attack  traffic. 

Looking  at  this  figure  we  see  a  highly  periodic  sequence  of  contrast  spikes.  If  one  employs  a  spectral 
analyzer,  the  spectrum  power  density  will  have  a  high  spike.  Recall  that  this  is  exactly  the  idea  behind 
the  hybrid  system.  We  now  report  our  results  for  the  new  dataset.  Figure  12.12  illustrates  the  detection 
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Figure  12.11:  SSH  dictionary  attack  signature. 


process.  The  main  conclusion  is  that  despite  the  fact  that  the  intensity  of  the  attack  now  is  far  less  than 
before,  changepoint  detection  reveals  it  instantaneously  anyway. 
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Figure  12.12:  Diminished  SSH  dictionary  attack  traffic  pattern  and  its  detection. 

The  problem  though  is  that  if  the  detection  threshold  is  lowered  so  as  to  have  an  even  more  quick 
detection,  it  will  inevitably  result  into  numerous  false  alarms.  To  filter  these  false  alarms,  one  can  employ 
FFT  to  uncover  hidden  periodicities  in  the  traffic  flow  caused  by  the  presence  of  attacks.  Figures  12.13(a) 
and  12. 13(b)  show  the  power  spectral  density  for  this  dataset  before  and  during  the  attack.  Note  the  spike  in 
the  PSD  under  attack.  This  is  exactly  because  of  the  aforementioned  periodicity. 

To  conclude  we  remark  that  using  the  anomaly-spectral  IDS,  one  can  achieve  unprecedented  speeds  of 
detection. 
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(a)  Before  attack.  (b)  Under  attack. 

Figure  12.13:  Spectral  characteristics  of  the  traffic  before  and  during  the  attack. 


4.2.  Phase  2:  Post  Unauthorized  Break-In  Activity 

When  trying  to  detect  a  compromised  machine  by  looking  at  network  traffic  only,  challenges  are  high.  Due  to 
privacy  concerns  we  assume  we  cannot  look  inside  packets  to  determine  what  traffic  is  malicious.  Moreover, 
even  if  we  had  access  to  the  packet  payload,  many  machines  are  compromised  through  applications  such 
as  SSH,  which  means  that  the  payload  is  encrypted.  We  simulate  an  attack  scenario  where  a  hacker  breaks 
into  a  machine  using  a  compromised  SSH  password.  Note  that  this  covers  successful  dictionary  attacks 
(such  as  those  we  see  in  Phase  1),  and  stealing  username-password  information.  We  assume  that  the  hacker 
performs  several  suspicious  activities  that  normal  users  arc  not  likely  to  perform.  These  include  downloading 
a  malicious  binary  file  and  running  it,  fetching  a  few  more  binaries  from  an  external  server,  creating  a 
backdoor  for  future  connections  into  the  compromised  machine,  and  uploading  a  few  files  to  an  external 
server.  We  take  into  account  that  some  of  these  activities  such  as  access  to  the  backdoor,  may  happen  after 
the  hacker  has  logged  out. 

Although  this  scenario  is  generic,  it  is  fully  customizable:  the  hacker  activities  can  be  altered,  timing 
between  commands  can  be  changed,  and  as  a  result,  any  scenario  we  deem  plausible  can  be  simulated. 
Currently,  the  simulation  consists  of  the  following  steps: 

1.  Create  an  attack  script.  This  is  a  set  of  shell  commands  representing  what  the  attacker  will  do  once 
logged  into  the  machine. 

2.  Create  traffic  to  the  machine.  This  shell  script  is  run  after  compromise,  and  accesses  the  back  door 
the  original  attack  script  has  created. 

3.  Run  a  network  capture  tool  on  the  machine  to  capture  all  attack  traffic  (we  use  tcpdump). 

4.  Run  the  attack  scripts. 

The  above  scenario  was  implemented,  resulting  in  a  trace  about  38  seconds  with  about  35K  packets. 
The  target  machine’s  IP  address  is  129. 82. 138. 26.  The  machine  is  assumed  to  be  running  such  standard 
network  services  as  HTTP,  FTP,  POP,  SMTP,  SSH,  etc.  These  services  typically  run  on  ports  whose  number 
is  less  than  1024.  Backdoors,  however,  typically  run  on  higher  port  numbers.  A  big  challenge  is  to  detect 
the  correlation  between  the  SSH  communications  and  deviations  (which  may  be  very  slight)  in  the  number 
of  traffic  packets  in  and  out  of  the  machine.  In  our  scenario,  an  SSH  login  is  followed  by  an  increase  in 
the  number  of  incoming  or  outgoing  packets,  which  is  a  strong  indication  of  suspect  activity  by  the  user 
who  just  logged  in.  Thus,  not  only  do  we  want  to  detect  that  the  behavior  of  the  traffic  flow  (both  incoming 
and  outgoing)  generated  by  the  machine  has  changed,  but  also  correlate  the  change  and  a  particular-  login 
through  SSH. 
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The  surveillance  begins  from  the  very  first  SSH  packet  sent  by  the  hacker  in  an  attempt  to  log-in  to  the 
machine.  Our  approach  is  to  treat  every  connection  as  suspicious,  and  create  state  to  monitor  all  successful 
connections.  In  this  simple  scenario  it  takes  the  hacker  about  5  seconds  to  start  “fooling  around”  with  the 
machine:  5  seconds  after  the  beginning  of  the  surveillance,  the  hacker  executes  a  command  that  causes  the 
machine  to  increase  its  traffic.  This  increase  is  readily  and  successfully  detected  by  the  changepoint  detection 
based  anomaly  IDS,  as  seen  in  Figure  12.14.  Note  that  the  scenario  involves  both  in-  and  out-coming  traffic, 
which  can  be  used  to  reduce  false  positives. 


Figure  12.14:  Detection  of  unauthorized  SSH  break-in  attempt. 


5.  Conclusion 

Rapid  intrusion  detection  in  high-speed  computer  networks  with  low  false  alarm  rate  is  a  challenge  for 
military,  government  and  industrial  networks.  This  problem  has  been  addressed  by  various  agencies,  uni¬ 
versities,  and  companies  for  many  years.  Still  this  problem  is  not  solved. 

We  applied  quickest  changepoint  detection  methods  for  the  development  of  efficient  anomaly-based 
IDS-s  that  arc  capable  for  detecting  attacks  in  computer  networks  with  small  detection  delays  for  a  given 
(low)  false  alarm  rate.  The  effectiveness  of  the  proposed  changepoint  detection  based  IDS  has  been  verified 
by  implementation  in  real-word  scenarios  for  detecting  ARP  MiM  insider  attacks  and  DDoS  TCP,  ICMP, 
and  UDP  external  attacks  as  well  as  for  detecting  spam  and  unauthorized  break-ins.  These  results  prove  that 
major  drawbacks  and  technological  barriers  of  existing  intrusion  detection  systems  can  be  overcome  through 
the  use  of  a  completely  new  approach  to  intrusion  detection  that  relies  on  advanced  changepoint  detection 
methods.  In  particular,  replacing  current  ad-hoc  deterministic  decision  rules  with  advanced  changepoint 
detection  algorithms  allows  for  controlling  false  alarm  rate  and  detecting  unexpected  intrusions. 

In  addition,  combining  an  anomaly  IDS  and  spectral-based  signature  algorithms  with  false  alarm  fil¬ 
tering  capability  allows  one  to  lower  thresholds  in  the  anomaly  IDS,  which  reduces  detection  delays  to  a 
minimum.  Therefore,  the  proposed  approach  to  rapid  intrusion  detection  integrates  best  possible  anomaly- 
based  (statistical)  solutions  with  a  spectral-based  signature  IDS  in  distributed  systems.  The  feasibility  of 
the  proposed  hybrid  IDS  has  been  proven  for  detecting  DDoS  attacks  by  using  LANDER  data  sets  collected 
by  1ST  We  believe  that  the  proposed  hybrid  anomaly-spectral  IDS  solves  both  aspects  of  the  problem  -  it 
simultaneously  provides  breakthrough  in  terms  of  achieving  unprecedented  speeds  of  detection  (i.e.,  small 
detection  delays  of  true  attacks)  and  a  very  low  false  alarm  rate. 
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Chapter  13 

Application  of  Adaptive  Spatiotemporal 
Image  Processing  and  Nonlinear  Filtering 
Methods  in  Remote  Sensing 


The  work  reported  in  this  chapter  has  been  performed  mainly  by  Alexander  Tartakovsky.  The  results  related 
to  NLF-based  track-before-detect  arc  joint  work  with  Boris  Rozovsky. 

1.  Introduction 

The  problem  of  efficient  clutter  rejection  is  a  challenge  for  Space-Based  Infrared  (SBIRS)  and  Space  Track¬ 
ing  and  Surveillance  System  (STSS)  sensors  with  chaotically  vibrating  lines-of-sight  (LOS)  that  have  to 
provide  early  detection  and  tracking  of  targets  (e.g.,  missile  launches)  in  the  presence  of  highly-structured 
cloud  backgrounds.  In  such  systems,  the  intensities  of  cluttered  backgrounds  due  to  solar  scattering  by 
clouds,  aerosols  and  earth  surface  (ground,  sea,  etc.)  or  by  IR  airglow  emissions  arc  typically  dozens  and 
even  hundreds  of  times  greater  than  either  sensor  noise  or  the  intensities  of  the  targets  that  arc  to  be  detected 
and  then  tracked.  As  a  result,  reliable  target  detection  and  subsequent  tracking  is  impossible  without  clutter 
rejection  to,  or  even  below,  the  level  of  sensor  noise. 

Most  existing  clutter  rejection  technologies  for  unstabilized  or  mechanically  stabilized  platforms  rely  on 
spatial-only  or  simple  image  differencing  methods.  However,  the  results  of  our  study  presented  below  show 
that  even  the  best  spatial-only  filters  arc  not  efficient  enough,  especially  in  unfavorable  conditions  that  arc 
typical  for  applications  of  interest.  Moreover,  these  filters  cannot  be  combined  with  temporal  processing  in 
cases  where  clutter  is  non-stationary  in  time  due  to  platform  vibrations  (jitter),  which  is  always  the  case.  A 
similar  conclusion  holds  for  an  industry  standard  differencing  method. 

In  this  project,  we  argue  that  the  level  of  clutter  suppression  required  for  reliable  detection  and  tracking 
can  be  achieved  only  by  implementing  novel  spatiotemporal  image  processing  methods  rather  than  spatial 
filtering  alone.  Note  that  in  order  to  make  temporal  processing  efficient,  clutter  rejection  algorithms  must 
be  supplemented  by  very  accurate  jitter  estimation  and  scene  stabilization  techniques  that  compensate  for 
platform  vibrations  and  eliminate  residual  frame  misalignment.  Our  image  registration/stabilization  tech¬ 
niques  arc  entirely  different  from  those  previously  used.  Stabilization  is  performed  iteratively  in  the  course 
of  clutter  rejection,  and  the  corresponding  stabilization  algorithm  is  a  paid  of  the  clutter  rejection  architec¬ 
ture.  We  show  that  this  novel  approach  is  extremely  efficient  allowing  for  very  accurate  interpolation  and 
image  reconstruction  in  a  wide  variety  of  conditions. 

However,  for  a  variety  of  moving  platforms,  one  may  expect  difficult  scenarios  where  even  after  clutter 
rejection  the  effective  SNR  is  low,  so  that  the  in-frame  detection  with  a  given  acceptable  FAR  is  not  possible. 
The  only  way  to  overcome  this  difficulty  is  to  develop  efficient  track-before-detect  architectures  based  on 
spatial-temporal  NLF  methods.  These  algorithms  should  process  a  series  of  frames  simultaneously  without 
detecting  targets  in  each  frame  accumulating  information  along  hypothetical  tracks. 
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2.  The  Developed  System 

In  this  research,  we  focus  on  developing  algorithms  and  software  for  adaptive  clutter  suppression,  target 
detection  and  multiple  target  tracking  for  a  variety  of  observation  conditions;  tuning  and  optimization  of 
these  algorithms  for  particular  scenarios;  and  testing  and  validation  through  synthetic  simulations  and  pro¬ 
cessing  of  real  data.  The  primary  goal  is  to  develop  a  viable  prototype  of  the  multiple  target  tracking  system 
that  includes  a  reconfigurable,  adaptive  clutter  rejection  (CLUR)  system  that  can  be  tested  using  a  built-in 
simulator  which  mimics  real  environments.  The  developed  system  and  corresponding  software  tools  have 
the  following  functionalities  and  capabilities: 

1.  Built-in  generator  of  image  sequences  with  moving  point  illumination  sources  (targets),  background 
clutter  due  to  cloud  cover,  jitter  due  to  platform  vibrations,  and  sensor  noise. 

2.  A  bank  of  image  processing  algorithms  (CLUR  filters)  with  a  reconfigurable  architecture  and  ability 
to  compensate  for  strong  signals  from  bright  targets  and  outliers. 

3.  Auto-tuning  and  auto-selection  algorithms  that  allow  for  automatic  selection  of  the  optimal  filter  from 
the  bank  for  current  conditions. 

4.  In-frame  detection  algorithms  with  constant  false  alarm  rate  (stabilization  of  false  alarms). 

5.  ONF-based  multitarget  tracking  algorithms,  in  particular: 

(a)  Changepoint  detection-based  track  initiation  algorithms 

(b)  Identification  of  detections  as  belonging  to  existing  tracks 

(c)  Forming  new  tracks  and  deletion  of  false  tracks 

(d)  Changepoint  detection-based  track  termination 

6.  Graphical  user  interface  (GUI)  for  visualization  of  the  results  of  processing  and  for  input  data  and 
parameters. 

A  general  block-diagram  of  the  system  with  the  corresponding  data/signal  processing  flow  is  shown  in 
Figure  13.1. 


Figure  13.1:  Block-diagram  of  the  CLUR-detection-tracking  system. 


This  system  exploits  advanced  spatiotemporal  image  processing  methods  for  clutter  rejection  with  strong 
signal  compensation,  changepoint  detection  based  track  initiation  and  termination  algorithms,  NLF-based 
target  tracking  algorithms,  and  global  data  association  (optimal  for  association  of  all  detections  but  not 
locally  optimal  for  a  particular  detection),  among  others. 
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3.  Spatiotemporal  Image  Processing  Algorithms  for  Clutter  Rejection 

The  developed  baseline  CLUR  technique  is  based  on  a  multi-parametric  approximation  of  clutter  (regression- 
type  modeling)  that  leads  to  an  adaptive  spatial-temporal  filtering  (image  processing)  algorithm.  The  “coef¬ 
ficients”  of  the  filter  arc  calculated  adaptively  according  to  the  minimum  distance  algorithm.  The  adaptive 
spatial-temporal  filter  allows  one  to  suppress  any  background,  regardless  of  its  spatial  variation.  It  not  only 
whitens  the  data  but  also  corrects  all  translational  and  rotational  distortions.  The  results  of  an  experimental 
study  presented  in  Section  5  show  that  the  proposed  algorithm  gives  a  substantial  gain  compared  to  the  best 
existing  spatial  techniques  as  well  as  to  the  industry  standard  temporal  differencing  method. 

We  start  with  a  description  of  the  basic  idea  and  a  generic  CLUR  algorithm  for  the  class  of  parametric 
problems  that  involve  parametric  approximations  of  the  function  bn(r)  -  background  clutter.  This  approach 
was  first  proposed  by  Tartakovsky  and  Blazek  [166].  Note  that  we  do  not  use  any  assumptions  on  the 
statistical  properties  of  clutter.  All  we  assume  is  that  clutter  is  an  arbitrary  function  of  spatial  coordinates 
(may  be  a  quite  sharp  function)  and  a  slowly  varying  function  of  time  in  a  certain  time  interval  m. 

The  basic  iterative  CLUR  algorithm  that  includes  jitter  compensation  has  the  following  form: 

1.  Initialization.  This  step  can  be  performed  in  various  ways.  Typically  this  step  requires  about  m 

observations  and  the  result  of  the  initialization  stage  arc  the  pilot  estimates  0k(rn),  (6 1, . . . ,  5m),  and  back¬ 
ground  bm(rij)  =  6k{™)  fk{fij — <5m),  where  {fk}  is  the  orthogonal  basis  (Fourier,  wavelets,  splines), 

6  is  unknown  shifts  due  to  vibrations  (jitter),  and  6  are  parameters  in  the  space-time  splitting  decomposition 
(approximation)  of  clutter  bn(r).  Initialization  schemes  include  autonomous  algorithms  of  estimation  of 
shifts  between  two  frames  based  on  simple  spline-approximations. 

2.  Typical  Step  bn ,  n  >  m. 

(a)  Jitter  Estimation.  The  estimate  bn^\{rlj)  obtained  from  the  previous  step  is  compared  with  the 
n-th  frame  (stalling  with  n  =  rn),  and  the  ML/MD  estimate  of  jitter  5n  is  computed  as  the  solution  of  the 
nonlinear  optimization  problem  with 

M 

bn-i(rij  -S)  =  ^2  ®k(,n  -  1  )fk(nj  -  5). 

k= 1 

(b)  Estimation  of  Parameters.  Having  the  estimates  <5n_m+i, . . .  ,Sn,  the  estimates  0k(n)  arc  com¬ 
puted  for  the  n-th  frame  from  the  least  squares  minimization  problem  by  comparing  the  observed  frame 
(image)  Zn(rij)  with  the  model.  This  recomputing  in  the  corrected  coordinate  system  is  equivalent  to  frame 
alignment. 

(c)  Clutter  estimation.  Using  the  estimates  obtained  from  previous  steps,  compute  the  estimate  of 
bn  (rtj )  for  all  i  and  j  in  the  corrected  coordinate  system, 

M 

bnifij)  =  Ok(n)fk(rij  -  Sn).  (13.1) 

k= l 

(d)  Clutter  Rejection.  Using  the  estimate  (13.1),  compute  the  residuals  (filtered  background) 

M 

Znf'ij bn  )  =  Z  n  ('Cij  )  ^  ^  &  k  (  n  )  f  k  (  f  j  j  S  n  ) . 

k=  1 

Our  study  of  various  algorithms  showed  that  the  following  parametric  models  and  corresponding  spatial- 
temporal  filters  are  feasible  for  implementation  in  the  bank  of  CLUR  filters: 

1.  Two-dimensional  Fourier  Series  with  Double  Nyquist  Rate  — “Fourier.” 

2.  Two-dimensional  Wavelet  Series  — “Wavelet.” 

3.  Local  Polynomial  Approximation  — “Pol.” 

4.  Spline -based  Interpolation  Methods  with  Double  Resolution  —  “Spl-DR.” 

5.  Spatial-temporal  Autoregression  —  “STAR.” 
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4.  Target  Detection  and  Tracking 

In  this  section  we  address  target  detection  and  tracking  methods  within  the  conventional  detect-then-track 
approach. 

4.1.  In-Frame  Detection  Algorithms 

Each  target  detection  method  specified  below  solves  the  problem  of  in-frame  target  detection  and  estima¬ 
tion  of  targets’  positions  and  intensities,  i.e.,  forms  the  so-called  “blips.”  Each  blip  is  characterized  by  the 
estimates  of  both  “total”  and  “effective”  signal  intensities  as  well  as  the  target  position.  It  is  assumed  that  a 
frame  is  first  whitened  by  one  of  the  CLUR  filtering  algorithms.  The  following  algorithms  arc  available  for 
selection  in  the  “in-frame  target  detection  module”: 

•  “Simplest”  is  based  upon  direct  data  thresholding.  The  center  of  a  hot  pixel  (which  intensity  is 
greater  then  a  detection  threshold)  is  regarded  as  the  target  position  estimate  and  the  corresponding 
signal  amplitude  as  the  estimate  of  the  target  intensity. 

•  “Optimum”  is  based  on  the  precise  calculation  of  the  optimum  decision-making  statistic  for  each 
node  (a  square  composed  from  four  pixels)  with  compensation  of  the  signals  from  already  detected 
targets.  It  uses  information  about  the  form  of  the  PSF.  It  allows  for  resolving  two  targets  within  one 
node. 

•  “Sub  opt”  is  the  version  of  the  “optimum”  algorithm  where  the  optimal  decision-making  statistic  is 
substituted  by  a  sub-optimal  one  calculated  approximately  at  the  reference  points  located  in  a  grid 
having  a  period  of  0.5  pixel  size  (two  targets  within  one  node  cannot  be  resolved). 

•  “Adaptive”  is  a  version  of  the  “sub_opt”  algorithm  with  the  automatic  adjustment  of  the  detection 
threshold  with  the  use  of  the  background  estimates.  As  a  result,  a  density  of  falsely  detected  blips  is 
maintained  constant  in  different  fragments  of  a  frame. 

4.2.  The  Multitarget  Tracker 

The  tracker  performs  the  following  operations: 

•  Identification/association  of  new  detections  (blips)  with  existing  tracks. 

•  Initiation  of  new  tracks  based  on  blips  that  arc  not  identified  as  belonging  to  existing  tracks. 

•  Confirmation  of  newly  initialized  tracks. 

•  Deletion  of  unconfirmed  tracks. 

•  Termination  of  tracks. 

Track  initiation  and  track  termination  algorithms  are  subdivided  into  the  two  classes:  (1)  signal-based 
procedures  that  operate  with  likelihood  ratios  built  on  models  of  intensity  distributions  around  local  detec¬ 
tions,  and  (2)  binary  algorithms  that  arc  based  on  blips/detections  (i.e.,  binary  quantized  data). 

We  propose  two  signal-based  track  initiation  algorithms  that  arc  based  on  the  changepoint  detection 
methods  that  are  applied  to  signals  (intensities)  in  the  vicinity  of  in-frame  detections.  The  first  algorithm 
is  the  unlimited  CUSUM-type  algorithm;  and  the  second  one  is  the  window-limited  CUSUM  test  (WL- 
CUSUM).  In  the  following,  these  algorithms  will  be  referred  to  as  “signal-based.” 

The  same  two  types  of  change  detection  tests  arc  used  for  binary  quantized  data  (i.e.,  for  a  sequence 
of  detections/blips).  These  algorithms  turn  out  to  be  more  robust  compared  to  the  former  signal-based, 
especially  when  the  conjecture  on  the  Gaussian  model  is  not  true.  At  the  same  time,  the  loss  of  efficiency  is 
small  (less  than  30%)  compared  to  the  signal-based  approach  when  the  true  model  is  indeed  Gaussian. 

At  the  stage  of  tracking,  the  track  is  predicted  to  the  next  step  and  the  corresponding  gate  is  calculated. 
When  target  disappears,  the  distribution  of  the  data  in  gates  abruptly  changes.  Therefore,  termination  of 
tracks  can  be  effectively  performed  based  on  the  quickest  detection  of  the  moment  of  target  disappearance 
similar  to  track  initiation.  Note  also  that  a  similar  approach  is  used  for  termination  of  false  tracks. 
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5.  Testing  for  Geostationary  Platforms 

We  performed  a  detailed  comparative  study  of  the  “industry  standard”  differencing  method  with  our  clutter 
rejection  techniques.  The  differencing  clutter  rejection  method  simply  subtracts  two  consecutive  frames.  It 
is  therefore  equivalent  to  our  temporal  window-limited  clutter  rejection  filter  (implemented  in  the  bank  of 
CLUR  filters)  with  the  window  size  of  1  frame. 


(a)  Original  frame  (b)  Spatial-temporal  Wavelet  filter  (c)  Differencing  CLUR  algorithm 

Figure  13.2:  The  results  of  clutter  suppression  and  target  tracking  using  spatial-temporal  Wavelet  and  dif¬ 
ferencing  CLUR  filters. 


We  used  an  image  sequence  with  moderately  intense  clutter  and  sensor  noise  STD  <j/v  =  3.  Two  weak 
targets  were  inserted  in  the  sequence.  We  first  used  the  Wavelet  spatial-temporal  filter  with  window  of 
20  frames.  The  results  were  very  successful  —  the  standard  deviation  of  the  residual  clutter  plus  noise  was 
about  3  and  both  targets  were  tracked,  as  can  be  seen  in  Figure  13.2(b).  By  contrast,  the  differencing  method 
was  not  able  to  track  targets,  as  seen  from  Figure  13.2(c).  In  the  images,  squares  with  no  tracks  attached 
represent  instantaneous  detections  part  of  which  are  false,  while  solid  lines  correspond  to  confirmed  target 
tracks. 

6.  Detection  and  Tracking  by  Fusing  Multiple  Frames  Through  Optimal  Nonlinear  Filtering 

For  moving  platforms,  such  as  low  earth  and  high  elliptic  orbits,  STSS,  shipboard  IRST  and  other  sensors, 
one  may  expect  difficult  situations  when  even  after  clutter  rejection  the  effective  SNR  is  low,  so  that  the  in¬ 
frame  detection  with  a  given  acceptable  FAR  is  not  possible.  Choosing  lower  detection  thresholds  is  not  an 
answer,  since  threshold  lowering  results  in  an  intense  flow  of  false  detections  and  initiation  of  multiple  false 
tracks,  which  in  turn  leads  to  tracker  failures.  To  overcome  this  difficulty  we  propose  track-before-detect 
(TBD)  algorithms.  These  algorithms  process  a  series  of  frames  simultaneously  without  detecting  targets  in 
each  and  every  frame,  i.e.,  they  accumulate  information  along  hypothetical  tracks. 

The  proposed  TBD  architecture  is  based  on  optimal  nonlinear  filtering  (ONF)  for  switching  multiple 
models.  We  show  that  the  optimal  TBD  algorithm  can  be  represented  in  the  form  of  a  bank  of  interact¬ 
ing  nonlinear  Bayesian  matched  filters.  The  output  of  this  bank  is  the  unnormalized  posterior  density  for 
the  target  position  given  previous  data.  The  TBD  algorithm  is  recursive  and  does  not  require  pruning  of 
hypotheses.  An  important  feature  of  the  algorithm  is  that  it  does  not  require  linear  dependence  between 
observations  and  the  state  process.  This  allows  one  to  build  a  TBD  algorithm  that  handles  highly  nonlinear 
(5-like  target  signatures  characteristic  for  small  targets  in  IR  images.  This  algorithm  is  a  particular  case  of  a 
more  general  approach  for  spatiotemporal  nonlinear  filtering  outlined  in  Chapter  2,  Section  2. 

Therefore,  in  certain  important  scenarios  tracking  with  the  conventional  detect-then-track  scheme  as 
well  as  with  the  bank  of  3D  matched  filters  (3DMF)  is  not  possible,  in  which  case  the  NLF-based  TBD 
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methods  are  needed.  We  now  present  a  comparative  analysis  with  the  industry  standard  bank  of  3D  matched 
filters  (3DMF)  for  a  challenging  data  sets  that  contain  strong  sea  glint. 

As  possible  target  dynamics  we  chose  16  different  directions  of  motion  (throughout  a  360  degree  angle) 
and  a  wide  range  of  velocities  (from  1  to  5  pixels  per  frame).  Only  transitions  to  neighboring  (direction- 
wise  and  speed-wise)  states  were  allowed  by  the  Markov  model  with  probabilities  of  switch  equal  to  0. 1 
for  every  direction  of  change.  Probabilities  of  changes  in  direction  and  speed  were  considered  separately 
and  the  probabilities  of  complex  direction/speed  transitions  were  obtained  by  multiplying  the  two  separate 
probabilities.  The  target  signal  was  simulated  by  a  square  of  3  X  3  pixels  with  an  average  per-pixel  SNR  of 
approximately  0.2  (—7  dB)  for  the  sea  glint  area,  about  1-2  (0  —  3  dB)  for  the  rest  of  the  sea  territory  and 
about  0.1  (—10  dB)  for  some  of  the  artifacts  inside  the  coastal  area  at  the  bottom  of  the  scenery.  In  order  to 
illustrate  the  power  of  the  developed  ONF-based  TBD  algorithm,  no  background  removal  was  performed  on 
the  images  before  the  tracking,  which  complicates  the  problem  tremendously.  The  target  enters  the  sea-glint 
area  at  frame  16.  First,  it  moves  along  a  straight-line  trajectory.  Upon  entering  the  glint  area  the  target  starts 
a  slight  maneuver,  decreasing  the  vertical  component  of  its  velocity  by  1  pixel  per  frame.  At  frame  26  the 
maneuver  is  terminated  and  the  target  proceeds  to  move  along  a  straight-line  trajectory. 


(a)  Target  tracking  by  ONF-TBD  (b)  Target  tracking  by  3DMF 


Figure  13.3:  The  results  of  target  tracking  in  video. 

The  screen-shots  with  the  tracking  results  for  ONF-TBD  and  3DMF  are  shown  in  Figure  13.3.  This  data 
set  represents  a  video  stream  that  was  obtained  with  a  Raytheon  VOx  long-wave  320  x  240  imager  pointed 
at  sun-glint  over  the  Pacific  Ocean.  The  solid  red  line  corresponds  to  the  true  target  trajectory,  while  the 
blue  “+”  line  corresponds  to  the  estimated  trajectory.  Once  the  ONF-TBD  finds  the  target  it  tracks  it  well, 
even  in  sea  glint  and  when  it  performs  a  maneuver.  On  the  other  hand,  the  3DMF  bank  tracks  the  target  well 
only  until  the  target  enters  the  sea  glint  area.  In  sea  glint,  it  immediately  loses  the  target. 

Based  on  the  experimental  results  we  can  conclude  that  (1)  the  popular  3DMF  bank  has  poor  perfor¬ 
mance  when  detecting  dim  maneuvering  targets,  and  (2)  the  proposed  ONF-based  TBD  algorithm  is  very 
robust  with  respect  to  both  target  maneuvering  and  very  low  SNR. 
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Chapter  14 

Testing  and  Validation:  A  Disaster  Scenario 


1.  Creation  of  a  Disaster  Scenario  Combining  all  Parts  Together 

We  have  created  a  scenario  illustrating  how  all  parts  of  the  MURI  project  could  fit  together  in  a  realistic 
scenario. To  this  end,  we  ran  an  undergraduate  REU  project  based  at  UCLA  called  “Disaster  LA”  in  which 
the  students  developed  the  details  of  the  disaster  plot.  In  order  to  construct  a  reasonable  scenario  and  also 
describe  a  realistic  response  from  the  local  authorities, the  students  have  been  in  contact  with  the  Los  An¬ 
geles  Police  Department’s  Real-time  Analysis  and  Critical  Response  (RACR)  division, which  is  tasked  with 
analyzing  incoming  crime  and  emergency  data  and  developing  appropriate  responses.  The  officers  working 
in  RACR  have  been  instrumental  in  aiding  in  development  of  the  scenario’s  timeline,  and  in  determining 
how  the  various  technologies  may  be  useful  to  such  a  government  agency. 

The  basic  element  of  the  disaster  scenario  involves  the  terrorist  detonation  of  multiple  high  explosives 
in  various  areas  of  Los  Angeles,  and  the  subsequent  aftermath.The  attack  actually  begins,  however,  with  a 
Denial  of  Service  (DOS)  attack  on  the  internet  communication  capabilities  of  the  LAPD  and  other  emer¬ 
gency  responders.  The  student’s  then  imagine  that  conventional,  chemical  (in  the  form  of  a  noxious  gas), 
and  nuclear  (a  “dirty  bomb”)  explosions  have  been  set  off  in  areas  such  as  Los  Angeles  International  Airport 
(LAX),  downtown  LA,  and  Westwood  (where  UCLA  is  located).  Lollowing  these  detonations,  the  author¬ 
ities  must  use  the  relevant  technologies  to  safely  secure  the  dangerous  areas,  help  people  evacuate  to  safe 
locations,  stop  any  potential  civil  unrest,  and  identify  and  apprehend  the  terrorists  involved  in  the  plot. 

To  accomplish  these  goals,  the  authorities  arc  first  imagined  to  be  using  state  of  the  art  DOS  detection 
algorithms  to  block  the  internet  attack  and  keep  their  information  channels  open  and  working.  Though  the 
bombs  arc  successfully  detonated  by  the  terrorists,  the  police  arc  able  to  respond  quickly  to  the  areas,  and 
use  sw  arms  of  autonomous  robots  to  detect  the  boundaries  of  the  dangerous  areas  (those  with  high  radiation 
or  containing  poisonous  gases)  so  that  civilians  can  be  relocated  outside  of  these  danger  zones.  To  deal  with 
possible  panic  in  the  general  population,  the  police  use  real-time  crime  mapping  and  prediction  algorithms 
to  send  their  already  stressed  forces  to  only  those  locations  automatically  designated  as  most  dangerous.  In 
order  to  identify  the  suspects  involved  in  the  attack,  the  authorities  use  video  footage  captured  from  a  number 
of  networks  of  autonomous  cameras  located  throughout  the  city,  each  running  video-tracking  algorithms  to 
identify  suspicious  behavior  (an  individual  leaving  behind  a  brief  case  in  LAX,  for  example)  and  follow  the 
individuals  involved.  These  suspects  arc  then  fed  into  a  program  that  keeps  track  of  known  terrorists  and 
their  associates  to  identify  who  among  them  arc  likely  perpetrators  for  this  type  of  attack. 

Though  the  scenario  envisions  technologies  that  arc,  in  some  cases,  not  yet  developed  for  field  work, 
each  of  the  projects  has  been  simulated  on  a  computer  to  provide  proof-of-concept  for  the  scenario.  In  doing 
so,  the  students  have  created  helpful  software  packages  and,  in  some  cases,  improvements  in  the  algorithms 
or  the  analysis  of  data  that  underlies  them.  In  this  way,  the  students  have  not  only  created  an  interesting 
presentation  that  showcases  the  technologies  and  their  potential,  but  also  aided  in  the  progress  needed  to 
achieve  the  level  of  capabilities  they  have  imagined. 
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Chapter  15 

Technology  Transfer 


1.  Transition  to  China  Lake  Naval  Weapons  Center 

Unmanned  air  systems,  ground-  and  underwater-vehicles  now  provide  a  range  of  important  functions  in 
naval  operations.  Fielded  systems,  however,  require  extensive  teams  of  highly  trained  operators  and  sup¬ 
port  personnel.  DOD  requirements  and  science  and  technology  priorities  address  the  need  for  increased 
autonomy  with  the  warfighter  monitoring  and  supervising  multiple  systems  to  effectively  leverage  assets 
while  reducing  individual  workload  and  numbers  of  personnel.  Co-PI  Bertozzi  served  as  a  consultant  on 
an  ONR  funded  project  developed  cooperative  control  algorithms  for  unmanned  vehicles  and  suggested  that 
that  project  use  the  algorithm  developed  in  [93]  funded  by  this  MURI  award.  The  searching  algorithm  has 
now  been  ported  to  the  Multiple  Unified  Simulation  Environment  (MUSE)  in  use  at  the  Naval  Air  Warfare 
Center  (NAWCWD),  China  Lake.  MUSE  integrates  ground  station  control  with  respective  platform  six 
degree-of-freedom  simulation  and  sensor  models  and  employs  MetaVR  scene  generation  to  support  a  num¬ 
ber  of  UAVs.  This  enables  high  fidelity  simulations  of  larger  systems  like  the  RQ/MQ-1  Predadator,  and 
MQ-9  Reaper,  as  well  as  smaller  systems  like  the  RQ-7  Shadow.  Often  used  in  training  applications,  MUSE 
was  adapted  to  incorporate  new  functions  and  capabilities  specific  to  integrated  systems  test  and  evaluation. 

Source  code  for  random  search  based  on  Levy  processes  was  provided  to  NAWCWD  at  China  Lake.  The 
researchers  at  China  Lake  implemented  the  code  on  two  different  simulation  platforms.  The  first  simulation 
environment  was  a  freeware  flight  simulator  that  provided  a  faster  than  real  time  simulation  conditions. 
The  second  simulation  was  the  MUSE  code  package  that  provided  realistic  simulation  for  multiple  UAV 
platforms.  The  capability  of  a  group  of  random  searchers  to  find  a  stationary  and  moving  target  using  the 
algorithm  provided  was  simulated  in  both  the  flight  simulator  and  the  MUSE.  It  can  be  shown  that  optimal 
search  performance  for  stationary  targets  with  no  a-priori  information  is  a  deterministic  search,  and  it  was 
shown  that  the  random  search  achieved  near-  optimal  search  performance.  Moreover,  the  random  search 
found  moving  targets  in  a  shorter  time  span  than  the  deterministic  code.  These  results  have  lead  to  continued 
work  at  NAWCWD-China  Lake  in  the  area  of  autonomous  search. 

Points  of  contact  at  China  Lake:  Arjuna  Flenner  and  Katia  Estabridis. 

2.  Transition  to  the  Los  Angeles  Police  Department  (LAPD) 

2.1,  Geographic  Profiling 

A  prototype  tool  for  geographic  profiling  of  serial  criminal  offenders  was  transferred  to  LAPD  in  September 
2009.  This  tool  is  based  on  the  algorithm  for  Geographic  Profiling  developed  by  the  2009  LAPD-RIPS 
team  and  has  a  user  interface  based  around  Google  Earth.  Geographic  profiles  score  the  likelihood  that  a 
serial  offender  has  a  key  anchor  point  in  a  given  location.  This  tool  is  innovating  in  that  it  incorporates 
information  about  local  demographic  and  spatial  features  of  the  environment  into  the  profile.  Subsequent 
interactions  with  LAPD  indicate  that  the  tool  is  used  with  the  Real-time  Analysis  and  Critical  Response 
Division  (RACR)  at  LAPD.  A  similar  prototype  tool  for  infilling  information  on  suspected  gangs  involved 
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in  gang  shootings  was  provided  to  LAPD  by  the  2010  LAPD-RIPS  team.  This  tool  examines  a  given  crime 
and,  based  on  rivalry  histories,  location  and  timing  of  the  crime,  probabilistically  score  which  of  many 
alternative  gangs  is  a  likely  suspect  gang. 

2.2.  Crime  Mapping 

The  paper  [155]  was  the  impetus  for  a  summer  REU  project  at  IPAM  as  part  of  their  RIPS  program.  Lead 
author  Laura  Smith  (finishing  PhD  student)  was  the  mentor  for  this  project.  The  students  wrote  a  suite  of 
crime  mapping  software  for  LAPD  based  on  their  current  crime  briefing  strategies  and  data  collects.  The 
students  implemented  methods  from  the  above  paper  along  with  more  standard  kernel  estimation  methods. 
Lor  the  specific  police  data  needs,  as  discussed  below,  the  kernel  methods  worked  as  well  as  the  new  method, 
however  there  arc  many  crime  mapping  applications  where  the  standard  method  is  much  less  effective  and 
these  arc  illustrated  in  the  manuscript.  The  RIPS  code  development  was  funded  by  an  NSL  grant  to  IPAM. 
The  project  was  specifically  chosen  based  on  the  work  in  the  above  paper  and  the  plan  was  to  implement 
these  and  other  algorithms  for  a  real  police  tool  application.  Code  has  been  given  to  LAPD  for  use  in  an 
upcoming  predictive  policing  experiment. 

UCLA  coPI  Brantingham  is  directly  involved  in  this  planned  predictive  policing  experiment.  The  inten¬ 
tion  is  to  have  a  control  group  and  a  target  area  so  that  we  can  test  the  effectiveness  of  the  crime  algorithms 
for  crime  reduction  against  current  LAPD  practices.  The  plan  would  be  to  work  the  tool  into  LAPD’s  daily 
war  room  process.  Laura  Smith  and  the  team  built  the  interface  to  mirror  our  current  process  of  looking  at  a 
1,  3  and  7  day  retrospective  map  at  the  daily  w  ar  room  crime  briefing.  When  LAPD  launches  the  test,  they 
will  use  the  1 ,  3  and  7  day  forecast  density  maps  to  plan  deployment  of  both  regular  patrol  resources  and  of 
specialized  units  or  discretionary  resources.  They  hope  to  have  results  by  the  end  of  the  year. 

Point  of  contact  at  LAPD:  Captain  Sean  Malinowski,  sean.malinowski@lapd.lacity.org. 

3.  Transition  to  DOE 

The  developed  changepoint  detection  methods  and  algorithms,  particularly  decentralized  methods,  can  be 
effectively  used  for  the  design  of  working  prototypes  of  detection  and  tracking  systems  for  detecting  and 
tracking  terrorist  activities  and  for  detection  and  isolation  of  computer  network  intrusions.  The  advanced 
changepoint  detection  algorithms  developed  at  the  University  of  Southern  California  (Prof.  Tartakovsky)  for 
anomaly-based  Intrusion  Detection  System  have  been  transferred  to  the  Oak  Ridge  National  Laboratory  for 
tuning  and  testing  over  UltraScience  Net  (USN)  network  research  testbed  for  operation  at  1-10  Gbps  rates 
(PI  Dr.  Rao  of  the  DOE  Oak  Ridge  National  Lab).  Preliminary  testing  shows  that  this  statistical  anomaly- 
based  IDS  provides  reliable  and  rapid  detection  of  certain  classes  of  attacks  with  very  low  false  alarm  rate; 
these  attacks  are  not  easily  amenable  to  detection  by  existing  signature -based  methods. 

4.  Transition  to  MDA  and  Air  Force 

The  advanced  target  detection  and  tracking  system  (developed  by  Prof.  Tartakovsky)  that  includes  a  bank 
of  clutter  rejection  filters  based  on  spatial-temporal  image  processing  algorithms  has  been  transferred  to 
MDA  and  the  Hanscom  Air  Lorce  Base  for  evaluation  and  testing  (POC  James  Brown).  The  developed 
algorithms  guarantee  very  high  detection/tracking  performance  for  small  ballistic  targets  (at  boost  and  mid 
coarse  stages)  allowing  almost  complete  suppression  of  solar  cloud  clutter  in  space-based  IR  sensors  (SBIRS 
HIGH).  Such  performance  cannot  be  achieved  with  existing  industry  standard  methods. 

5.  Transition  to  Commercial  Companies 

Prof.  Papadopoulos  continues  discussions  and  collaboration  with  ESoft.  ESoft  is  very  interested  in  our  ap¬ 
proach  to  characterize  the  IP  addresses  of  spammers,  and  wants  us  to  investigate  approaches  where  they  can 
classify  entire  blocks  of  addresses  as  suspicious.  ESoft  produces  appliances  that  sit  in  a  customer’s  network, 
which  provide  information  and  receive  constant  updates  from  the  central  office  about  new  vulnerabilities 
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and  also  send  back  reports  with  alerts.  They  arc  looking  at  new  algorithms  to  go  into  the  appliance,  and  paid 
of  them  may  be  our  approach  to  characterize  addresses  sending  spam. 

Prof.  Medioni  has  been  in  discussion  with  a  company  named  Inno Vision  Optics,  Co  about  transferring 
our  TNT  techniques  in  a  multi-camera  system.  Inno  Vision  Optics  provides  optical  capturing  and  camera 
support  for  many  industries,  such  as  movie  production,  etc.  Their  current  sophisticated  multi-camera  cap¬ 
turing  system  involves  intensive  human  interaction  to  track  a  tagged  object  of  interest,  such  as  the  basketball 
in  a  game.  They  plan  to  use  an  automatic  (or  minimal  human  interaction)  tracking  solution  to  remove  human 
operators  from  the  control  loop  and  achieve  more  efficient  multi-camera  collaboration. 

Video  tracking  algorithms  developed  at  USC  (Prof.  Medioni)  have  been  transferred  to  Torrey  Pines 
Logic,  Inc.,  San  Diego,  CA  for  embedding  into  a  shipboard  IRST  testbed  and  testing  against  real  IR  data 
sets  (collected  by  the  Navy)  that  contain  multiple  extended  targets  barely  visible  in  sea-glint. 
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